[Proj] Unicode

Gerald I. Evenden geraldi.evenden at gmail.com
Mon Jun 8 14:31:18 EST 2009


On Monday 08 June 2009 2:52:02 pm Thomas Knudsen wrote:
> 2009/6/8 Gerald I. Evenden <geraldi.evenden at gmail.com>
>
	...
>     printf("%ls\n", L"Schöne Grüße");

for my edification I grabbed a portion of the above string and:

gie at charon:~$ echo 'L"Schöne Grüße");' >foo
gie at charon:~$ m foo
L"Schöne Grüße");
gie at charon:~$ hd foo
00000000  4c 22 53 63 68 c3 b6 6e  65 20 47 72 c3 bc c3 9f  |L"Sch..ne Gr....|
00000010  65 22 29 3b 0a                                    |e");.|
00000015
gie at charon:~$

I see that the "normal text is taking up 1 byte per character and when hitting 
a funky character it escapes with c3 and a code.  So it seems that when 
everything is in ASCII we are in normal byte mode and when an extended 
character comes along it is handled with a two byte sequence.

Fair enough.  This *is not* the impression I got various previous descriptions 
as the 16 bit aspect kept comming up and made one think that the whole damn 
string was in 16-bit code.

As an aside, I dropped the string into vim and it displayed it properly.  
Alas, how does one enter this stuff without dropping into a character map 
display and wear your mouse out with drag-and-drop?

-- 
The whole religious complexion of the modern world is due
to the absence from Jerusalem of a lunatic asylum.
-- Havelock Ellis (1859-1939) British psychologist


More information about the Proj mailing list