[Proj] Unicode

Glynn Clements glynn at gclements.plus.com
Wed Jun 10 09:21:33 EST 2009

Gerald I. Evenden wrote:

> How about considering it this was with proj:
> Data and keyword entries associated symbols and numerics will be basic ASCII 
> thus the simple caseless comparison can be safely made.  The only exception 
> to ASCII control input would be non-format control characters in format 
> statements---thus degree marks.  However, this will probably be a problem on 
> input data scanning.

Recognising degree symbols on input is relatively straightforward.

You can use the ANSI mbstowcs() function to convert to wide character
representation; if __STDC_ISO_10646__ is defined, this will be

Alternatively, you can use nl_langinfo() to obtain the locale's
encoding, and the iconv() library to convert from this to e.g. 

Or, given the constraints of the input format, you could just forget
about encodings altogether and treat the byte sequences \xb0
(ISO-8859-1 [1]) or \xc2\xb0 (UTF-8) as degree symbols.

[1] Actually, it's the same for all of the ISO-8859-* encodings except
5 (Cyrillic), 6 (Arabic), 11 (Thai), 14 (Celtic) and 12 (doesn't
exist; it was supposed to be Devanagari, but was abandoned).

> However comments and descriptive material may be UTF-8.  That is, in the long 
> descriptive output of Putnins may be with full and proper accents.

The problem with output is handling the case where the user's locale
doesn't support the characters. iconv() will terminate a conversion at
the first character which cannot be represented in the output

Glynn Clements <glynn at gclements.plus.com>

