[Shapelib] UTF-8 filenames

Bram de Greve bram.degreve at bramz.net
Wed Jan 9 06:08:09 EST 2008


Hi all,

I've started to implement some basic support for UTF-8 filenames within
the context of Thuban.  Before I port it to the shapelib source tree,
I'd like to hear your comments.

You can find the results here:
https://wald.intevation.org/plugins/scmsvn/viewcvs.php/branches/WIP-pyshapelib-Unicode/thuban/libraries/shapelib/safileio.c?rev=2801&root=thuban&view=markup

What I've done is added some mechanism to detect if we're compiling on
windows, and defined SA_UTF8_WINDOWS if so (only within safileio.c).  Of
course, I've also included <windows.h> and linked to the necessary
libraries.

Then, based on the existance of SA_UTF8_WINDOWS I defined a new function
Utf8ToWideChar that will decode the UTF-8 filename to a wide character
one.  I use it in SAUtf8WFOpen and SAUtf8WRemove to pass the wide
character filenames to either _wfopen or _wremove.

I've choosen not to use CreateFile and the like, as doing so causes some
other issues to keep it equivalent to SADOpen.  For example, the access
string must be parsed and converted to CreateFile flags, while making
sure we maintain exactly the same behaviour.  I thought it was more
important to maintain similar behavior than to push the file size limit
beyond 2 gigs.  File hooks with CreateFile for files > 2 gigs are still
interesting, but I believe that's matter for another discussion.

Independently of SA_UTF8_WINDOWS, the function SASetupUtf8Hooks() is
defined then will either hook FOpen to SAUtf8WFOpen or SADFOpen,
depending on SA_UTF8_WINDOWS (and similar for Remove).  The declaration
of SASetupUtf8Hooks is always added to shapefil.h, independent of
SA_UTF8_WINDOWS.

This way, SASetupUtf8Hooks will always be available, but if of course
blatantly assumes that it will work on all platforms.  By construction,
it will of course work on the windows platform.  On other platforms,
SASetupUtf8Hooks is equivalent to SASetupDefaultHooks, and of course the
big question is if the default hooks can grok UTF-8 filenames in the
first place.  I know that's the case with the Apple platform as that
platform uses UTF-8 filenames anyway.  But what about other platforms? 
What about Linux?  Is there any expert around?

If we cannot guarantee it, we shall either have to make the declaration
of SASetupUtf8Hooks conditional, which of course will limit its use in
the portability of shapelib, or drop it entirely.

Cheers,
Bram


More information about the Shapelib mailing list