[Proj] Modern C functions

Fri Feb 27 17:04:31 EST 2009

Gerald I. Evenden wrote:

> > > Thus my question here is: do non-Gnu C compilers used by this audience
> > > recognize these functions?
> >
> > It's not an issue for the compiler, but for the standard library.
> 
> IMHO, when discussing in a general context, the term "compiler" consists of 
> the entire process of generating an executable module and all it constituent 
> pieces.  Including libraries.

But on many platforms, the two aren't necessarily correlated, i.e. you
have a choice of compilers and/or a choice of libc versions. It's the
choice of libc which affects whether e.g. strcasecmp() is available,
not the compiler.

> > The are specified by POSIX, but not by ANSI C (neither C89 nor C99).
> 
> I have not concerned myself with "ANSI" in 15+ years as it seems superfluous 
> with so many other "standards" around.
> 
> > As others have already pointed out, Windows doesn't provide them.
> > Also, gcc won't recognise them if you use -ansi and don't explicitly
> > add the POSIX feature macros, e.g. -D_POSIX_SOURCE. I don't know
> > whether uclibc has them.
> 
> The term "uclibc" does not make sense to me.  I am only used to the 
> terminology libxxxxx.  The "uc" prefix seems odd.

uclibc is a minimal implementation of the standard C library,
originally designed for embedded systems. The "uc" prefix is an
abbreviation for "microcontroller" (u = mu = micro), although the
library is also used on (usually Linux-based) systems with "real" CPUs
but with relatively little memory (PDAs, etc).

> As for Windoze, I understand that there are non-M$ compilers available for 
> Windoze and these may not be so self righteous.

It's not about the compiler, but the standard library. E.g. MinGW is
just gcc for Windows. The "libc" which provides the ANSI/POSIX
functions is the same MSVCRT (Microsoft Visual C Runtime) that would
be used by Microsoft's toolchain.

> > Also, bear in mind that the behaviour depends upon the locale; using
> > them for strings containing anything other than alphanumeric
> > characters is problematic.
> 
> Seemingly the most common usage is for equality: strXcmp(a,b)==0.  I see no 
> hassle with testing anything within the character set as long as str a and 
> str b are of the  same character set type.  But yes, there may be some 
> problems with some non-Latin alphabetics.  However, in the context of a given 
> program and its control of the alphabet---like geodesic, proj, etc.---then we 
> are only testing against an in-house alphabet.  If someone wants to convert a 
> program into using a non-latin alphabet, then the problem is transferred to 
> their domain.

Using it just for projection names won't be a problem.

But it isn't just non-Latin alphabets which are problematic, but
anything beyond US-ASCII. Many of the ISO-646 character sets replace
less-used symbols with additional characters; e.g. ISO-646-NO
(Norwegian) replaces [\]{|} with ÆØÅæøå, so strcasecmp("[\]", "{|}")
will return zero in such a locale.

I don't know how widespread such locales are (most Latin-based
locales now use ISO-8859-1 or UTF-8), but they were common enough for
the C99 standard to add support for digraphs:

       [#3] In all aspects of the language, these six tokens

               <:  :>  <%  %>  %:  %:%:

       behave, respectively, the same as these six tokens

               [   ]   {   }   #   ##

so that C code can be written in environments where these characters
aren't available.

FWIW, GRASS' solution is to provide its own function, G_strcasecmp(),
which specifically only affects A-Z:

	if (xx >= 'A' && xx <= 'Z')
	    xx = xx + 'a' - 'A';
	if (yy >= 'A' && yy <= 'Z')
	    yy = yy + 'a' - 'A';

rather than using tolower(), which is affected by the locale.

-- 
Glynn Clements <glynn at gclements.plus.com>