unicode characters

Hi guys,

first of all: don’t consider it custom code, it is not.

Following code is just an example, so don’t blame me, neither move the topic /emoticons/default_smile.png

I’m interested in reading “special” characters from std::string variable.

Let’s look at this example:

what if I’d like to transform all characters with acute, caron, ring above

( all diacritic characters ) into non-diacritic characters ? /emoticons/default_smile.png

…if there is string “ábcd” in msg variable, I’d like to find it by comparing to “abcd” string



void WorldSession::LookForNondiacriticWord(char* foundword, std::string msg) // variable msg taken from the message opcode


std::locale loc;

unsigned int i;

for(i=0; i < msg.length(); i++){


		case 'á':

		case 0x00E1: msg[i] = 'a'; break;

		case 'é':

		case 0x00E9: msg[i] = 'e'; break;

		case 'í':

		case 0x00ED: msg[i] = 'i'; break;

		case 'ó':

		case 0x00F3: msg[i] = 'o'; break;

		case 'ú':

		case 0x00FA: msg[i] = 'u'; break;

		case 'ý': 

		case 0x00FD: msg[i] = 'y'; break;


	msg[i] = tolower(msg[i],loc);


// let's have a list of non-diacritic words i'm looking for ( NWlist )

for( std::list<std::string>::const_iterator it = NWlist.begin(); it != NWlist.end(); ++it){

	if(msg.find((*it)) != std::string::npos){

		strcpy(foundword, (*it).c_str() );        // first found word from NWlist in msg is written into foundword variable





it doesn’t work at all…

I’ve tried a lot of ideas connected with ctype.h and locale.h, but nothing works…

I’m a bit desperate already, but still interested in it /emoticons/default_smile.png

… funny is that if I add string “ábcd” into the list it will find it /emoticons/default_biggrin.png

Any help of yours would be welcome

Thank you


I’m sure that the problem is here:


case 'á':

case 0x00E1: msg[i] = 'a'; break;

… condition if( msg == ‘á’ ) will be never true,

neither one with 0x00E1 will, but I don’t know why

I got it /emoticons/default_smile.png

I know exactly how it works /emoticons/default_cool.png

… so let me explain:

std::string msg; // there is string “déd” in msg variable

→ length of this string is… 4 !!!

→ there are 4 wint_t characters ( short integers )

→ character d ( value 100 ), special acute (65475), special letter e (65449), letter d again (100)

→ except special acute, there are 2 special carons ( 65476 and 65477 )

so if you want to transform é into e, you have to read acute first,

remember it in some variable and after reading that special e you

can push normal letter e (101) to another string, so you will get

new non-diacritic string

… btw forget and <locale.h> you need <ctype.h> only /emoticons/default_smile.png