[Proj] Unicode
Glynn Clements
glynn at gclements.plus.com
Wed Jun 10 09:21:33 EST 2009
Gerald I. Evenden wrote:
> How about considering it this was with proj:
>
> Data and keyword entries associated symbols and numerics will be basic ASCII
> thus the simple caseless comparison can be safely made. The only exception
> to ASCII control input would be non-format control characters in format
> statements---thus degree marks. However, this will probably be a problem on
> input data scanning.
Recognising degree symbols on input is relatively straightforward.
You can use the ANSI mbstowcs() function to convert to wide character
representation; if __STDC_ISO_10646__ is defined, this will be
Unicode.
Alternatively, you can use nl_langinfo() to obtain the locale's
encoding, and the iconv() library to convert from this to e.g.
ISO-8859-1.
Or, given the constraints of the input format, you could just forget
about encodings altogether and treat the byte sequences \xb0
(ISO-8859-1 [1]) or \xc2\xb0 (UTF-8) as degree symbols.
[1] Actually, it's the same for all of the ISO-8859-* encodings except
5 (Cyrillic), 6 (Arabic), 11 (Thai), 14 (Celtic) and 12 (doesn't
exist; it was supposed to be Devanagari, but was abandoned).
> However comments and descriptive material may be UTF-8. That is, in the long
> descriptive output of Putnins may be with full and proper accents.
The problem with output is handling the case where the user's locale
doesn't support the characters. iconv() will terminate a conversion at
the first character which cannot be represented in the output
encoding.
--
Glynn Clements <glynn at gclements.plus.com>
More information about the Proj
mailing list