[Proj] Unicode
Gerald I. Evenden
geraldi.evenden at gmail.com
Mon Jun 8 14:31:18 EST 2009
On Monday 08 June 2009 2:52:02 pm Thomas Knudsen wrote:
> 2009/6/8 Gerald I. Evenden <geraldi.evenden at gmail.com>
>
...
> printf("%ls\n", L"Schöne Grüße");
for my edification I grabbed a portion of the above string and:
gie at charon:~$ echo 'L"Schöne Grüße");' >foo
gie at charon:~$ m foo
L"Schöne Grüße");
gie at charon:~$ hd foo
00000000 4c 22 53 63 68 c3 b6 6e 65 20 47 72 c3 bc c3 9f |L"Sch..ne Gr....|
00000010 65 22 29 3b 0a |e");.|
00000015
gie at charon:~$
I see that the "normal text is taking up 1 byte per character and when hitting
a funky character it escapes with c3 and a code. So it seems that when
everything is in ASCII we are in normal byte mode and when an extended
character comes along it is handled with a two byte sequence.
Fair enough. This *is not* the impression I got various previous descriptions
as the 16 bit aspect kept comming up and made one think that the whole damn
string was in 16-bit code.
As an aside, I dropped the string into vim and it displayed it properly.
Alas, how does one enter this stuff without dropping into a character map
display and wear your mouse out with drag-and-drop?
--
The whole religious complexion of the modern world is due
to the absence from Jerusalem of a lunatic asylum.
-- Havelock Ellis (1859-1939) British psychologist
More information about the Proj
mailing list