[Shapelib] Re: shapelib improvements

Bram de Greve bram.degreve at bramz.net
Sun Dec 9 14:28:50 EST 2007


Mateusz Loskot schreef:
> Bram de Greve wrote:
>   
>> It might solve your problems with unicode filenames, but how will you
>> cope with textual content
>>     
>
> Bram,
>
> We are discussing solution for encodings of file paths only.
> Certainly, it wouldn't solve problems with handling localized content
> (strings) but this is another subject.
>
>   
>> You will need to build in all your encodings
>> as internally all textural content is char* exclusively (with various
>> encodings).  That can be done?
>>     
>
> Yes, it can.
> However, it is complex task and would require use of char codes
> encoders/decoders like iconv.
>
> This subject will probably be covered along with implementation of
> GDAL/OGR RFC 5 (http://trac.osgeo.org/gdal/wiki/rfc5_unicode).
>
>   
>> Also, I suggest to first consider if you can't solve your problems with
>> wrapper code.  Not wrapper code around the DLL, but building a unicode
>> DLL with wrapper code around the original.  That way, you don't have to
>> branch (or to fork), which might be beneficial to both ...
>>     
>
> This won't solve all possible problems with internationalized paths.
> IMHO, simpler and cleaner solution is to replace current I/O calls with
> Unicode-aware calls from C/C++ libraries. The main disadvantage is that
> it will introduce new fork. However, Shapelib is not a big library, it's
> just 3 files of code, so forking does not sound as a problem for
> possible merge in future.
>
>   
I'm a bit confused here.  AFAIK linux and mac use UTF-8 for filenames 
anyway, so on those platforms it shouldn't be any problem at all if 
encode your wchar_t* filenames to UTF-8 yourself, even with the current 
implemetation.  And UTF-8 is also recommended as encoding in GDAL/OGR 
RFC 5 you mentioned.  Only the windows platform is somewhat the 
hardlearning pupil of the class.  But using Frank's IO hooks proposal, 
it shouldn't be any problem at all to write IO hooks that take the UTF-8 
filename and feed it either to _wfopen or the unicode flavour of 
CreateFile and plug them into shapelib.  And as Frank is doing exactly 
this to add the CreateFile IO hooks, it should be ready made for your 
problem already.

An issue I can see is of course that if your DLL needs to accepts wide 
character strings, you would first need to encode it to UTF-8 only to 
have it decoded it back to unicode by the IO hooks.  That, I find a bit 
silly myself.  To get arround that, you would need some parallel code 
for wide character strings like I did for pyshapelib.  But that's 
strictly not necessary and perhaps only be valuable if you need to open 
lots and lots of shapefiles in short period.  I'm interested to see what 
Frank thinks of this.

Cheers,
Bram


More information about the Shapelib mailing list