unicode characters

I’m interested in reading “special” characters from std::string variable.

Let’s look at this example:

what if I’d like to transform all characters with acute, caron, ring above

…if there is string “ábcd” in msg variable, I’d like to find it by comparing to “abcd” string



void WorldSession::LookForNondiacriticWord(char* foundword, std::string msg) // variable msg taken from the message opcode


std::locale loc;

unsigned int i;

for(i=0; i < msg.length(); i++){


		case 'á':

		case 0x00E1: msg[i] = 'a'; break;

		case 'é':

		case 0x00E9: msg[i] = 'e'; break;

		case 'í':

		case 0x00ED: msg[i] = 'i'; break;

		case 'ó':

		case 0x00F3: msg[i] = 'o'; break;

		case 'ú':

		case 0x00FA: msg[i] = 'u'; break;

		case 'ý': 

		case 0x00FD: msg[i] = 'y'; break;


	msg[i] = tolower(msg[i],loc);


// let's have a list of non-diacritic words i'm looking for ( NWlist )

for( std::list<std::string>::const_iterator it = NWlist.begin(); it != NWlist.end(); ++it){

	if(msg.find((*it)) != std::string::npos){

		strcpy(foundword, (*it).c_str() );        // first found word from NWlist in msg is written into foundword variable





it doesn’t work at all…

I’ve tried a lot of ideas connected with ctype.h and locale.h, but nothing works…

I’m sure that the problem is here:


case 'á':

case 0x00E1: msg[i] = 'a'; break;

… condition if( msg == ‘á’ ) will be never true,

neither one with 0x00E1 will, but I don’t know why

… so let me explain:

std::string msg; // there is string “déd” in msg variable

→ length of this string is… 4 !!!

→ there are 4 wint_t characters ( short integers )

→ character d ( value 100 ), special acute (65475), special letter e (65449), letter d again (100)

→ except special acute, there are 2 special carons ( 65476 and 65477 )

so if you want to transform é into e, you have to read acute first,

remember it in some variable and after reading that special e you

can push normal letter e (101) to another string, so you will get

new non-diacritic string

