[Proj] Unicode

Glynn Clements glynn at gclements.plus.com
Wed Jun 10 09:21:33 EST 2009


Gerald I. Evenden wrote:

> How about considering it this was with proj:
> 
> Data and keyword entries associated symbols and numerics will be basic ASCII 
> thus the simple caseless comparison can be safely made.  The only exception 
> to ASCII control input would be non-format control characters in format 
> statements---thus degree marks.  However, this will probably be a problem on 
> input data scanning.

Recognising degree symbols on input is relatively straightforward.

You can use the ANSI mbstowcs() function to convert to wide character
representation; if __STDC_ISO_10646__ is defined, this will be
Unicode.

Alternatively, you can use nl_langinfo() to obtain the locale's
encoding, and the iconv() library to convert from this to e.g. 
ISO-8859-1.

Or, given the constraints of the input format, you could just forget
about encodings altogether and treat the byte sequences \xb0
(ISO-8859-1 [1]) or \xc2\xb0 (UTF-8) as degree symbols.

[1] Actually, it's the same for all of the ISO-8859-* encodings except
5 (Cyrillic), 6 (Arabic), 11 (Thai), 14 (Celtic) and 12 (doesn't
exist; it was supposed to be Devanagari, but was abandoned).

> However comments and descriptive material may be UTF-8.  That is, in the long 
> descriptive output of Putnins may be with full and proper accents.

The problem with output is handling the case where the user's locale
doesn't support the characters. iconv() will terminate a conversion at
the first character which cannot be represented in the output
encoding.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the Proj mailing list