[Shapelib] Re: shapelib improvements
Bram de Greve
bram.degreve at bramz.net
Sun Dec 9 14:28:50 EST 2007
Mateusz Loskot schreef:
> Bram de Greve wrote:
>
>> It might solve your problems with unicode filenames, but how will you
>> cope with textual content
>>
>
> Bram,
>
> We are discussing solution for encodings of file paths only.
> Certainly, it wouldn't solve problems with handling localized content
> (strings) but this is another subject.
>
>
>> You will need to build in all your encodings
>> as internally all textural content is char* exclusively (with various
>> encodings). That can be done?
>>
>
> Yes, it can.
> However, it is complex task and would require use of char codes
> encoders/decoders like iconv.
>
> This subject will probably be covered along with implementation of
> GDAL/OGR RFC 5 (http://trac.osgeo.org/gdal/wiki/rfc5_unicode).
>
>
>> Also, I suggest to first consider if you can't solve your problems with
>> wrapper code. Not wrapper code around the DLL, but building a unicode
>> DLL with wrapper code around the original. That way, you don't have to
>> branch (or to fork), which might be beneficial to both ...
>>
>
> This won't solve all possible problems with internationalized paths.
> IMHO, simpler and cleaner solution is to replace current I/O calls with
> Unicode-aware calls from C/C++ libraries. The main disadvantage is that
> it will introduce new fork. However, Shapelib is not a big library, it's
> just 3 files of code, so forking does not sound as a problem for
> possible merge in future.
>
>
I'm a bit confused here. AFAIK linux and mac use UTF-8 for filenames
anyway, so on those platforms it shouldn't be any problem at all if
encode your wchar_t* filenames to UTF-8 yourself, even with the current
implemetation. And UTF-8 is also recommended as encoding in GDAL/OGR
RFC 5 you mentioned. Only the windows platform is somewhat the
hardlearning pupil of the class. But using Frank's IO hooks proposal,
it shouldn't be any problem at all to write IO hooks that take the UTF-8
filename and feed it either to _wfopen or the unicode flavour of
CreateFile and plug them into shapelib. And as Frank is doing exactly
this to add the CreateFile IO hooks, it should be ready made for your
problem already.
An issue I can see is of course that if your DLL needs to accepts wide
character strings, you would first need to encode it to UTF-8 only to
have it decoded it back to unicode by the IO hooks. That, I find a bit
silly myself. To get arround that, you would need some parallel code
for wide character strings like I did for pyshapelib. But that's
strictly not necessary and perhaps only be valuable if you need to open
lots and lots of shapefiles in short period. I'm interested to see what
Frank thinks of this.
Cheers,
Bram
More information about the Shapelib
mailing list