[Shapelib] Re: shapelib improvements

Bram de Greve bram.degreve at bramz.net
Sat Dec 29 13:59:52 EST 2007


Mateusz Loskot wrote:
> As you say, UTF-16 is used *internally*. Actually, Java supports Unicode
> in a *mess*, exposing Unicode in 3 or 4 different ways, including their
> own modified version of UTF-8 encoding (brrr!).
> So, actually, different components of Java use different standard,
> for exmaple Data{Input|Output}Stream uses modified UTF-8,
> OutputStreamWriter and InputStreamReader can use *any* encoding,
> String can use *any* encoding, etc.
>
> For me, Java and Windows arguments are irrelevant here because Shapelib
> does not use system specific API of any of the systems listed above.
> Shapelib is just a data storage/transfer layer and as such, the only
> portable and IMHO reasonable choice is UTF-8.
> UTF-16 and UTF-32 make more troubles than it's worth.
> UTF-8 is more natural choice because:
> - UTF-8 works well with legacy platforms and clients that only
>   support 8-bit characters
> - UTF-8 is compatible with ASCII
> - UTF-8 is more compact
> - UTF-8 is byte oriented instead of word oriented
> - UTF-8 is C strings friendly
> - UTF-8 is more efficient (it depends on range of content)
> - UTF-8 is compatible with all Unix systems as well as recommended in
> standards and protocols like W3C, IETF, IMC, etc.
>
> All these suggest me that UTF-8 support easier to implement for highly
> portable data storage software like Shapelib is.
>
>   
I'm working on an UTF-8 version of the IO hooks for shapelib for my work
on pyshapelib, though it is getting some delay as there are other issues
that need my limited spare time.

Cheers,
Bramz


More information about the Shapelib mailing list