[Proj] Experiment to speed up proj.4 by 2 or more

Tue Jun 23 12:57:00 EST 2015

Le mardi 23 juin 2015 19:26:45, Greg Troxel a écrit :
> Even Rouault <even.rouault at spatialys.com> writes:
> > José Luis,
> > 
> >> 1. Introducing C++: From my point of view, one of the PROJ best things
> >> is that it is written in pure C.
> > 
> > In which respect is the actual implementation of proj important to you as
> > a user, if you can still use it as a C library ? All nowadays C
> > compilers should be able to deal with C++ as well, and here I'm only
> > using C++98 features (I'd even say C++83 ;-)), not C++14.
> 
> That's not the only question; there's also packaging and portability.
> It could be that people only run proj on desktop systems that already
> have C++.

Hum, sorry I don't understand this sentence.

> 
> Wiuh respect to packaging, are you proposing to detect the SSE
> instructions at compile time, or at runtime,

You obviously need the 2 aspects:
- compile time: you must detect if the compiler supports generating SSE2 (or 
AVX, AVX2, etc...), to decide which source files to compile and with which 
options.
- runtime: you must check the feature set supported by the running CPU and 
select accordingly the appropriate entry points. In the case of a x86_64 
binary, you know that SSE2 is supported. For other instruction sets such as 
AVX and later, you must use cpuid + xgetbv to check OS support for AVX. (or 
for SSE2 on i86).

Examples:
- on x86_64, if the compiler supports AVX2, it would compile both the SSE2 and 
AVX2 implementations. And proj would query the CPU&OS at runtime to see if 
AVX2 is supported
- on i386, the non-SIMD, SSE2 and AVX2 implementations.
   Possibly with options to disable some implementations at compile time.
- on obscure_architecture, just the non-SIMD implementation.

That requires care, but that's doable. I've done similar things in GDAL 
(gdal_grid has some optimized implementation of the invdist algorithm for SSE 
and AVX, with the above described mix of compile+runtime checks).

For other platforms, non SIMD code would still be used (SLEEF also supports 
ARM Neon, so that could possibly be ported to non Intel platforms too). The 
"Register" C++ class used would just encapsulate a single double value then.

> and how does this interact
> with building packages on one system that run on another. 

That should work fine, even with cross compilers.

> Or is this
> only for amd64?

I tried my POC with gcc -m32 -msse2 and that works fine. Surprisingly 
performance was just a bit below x86_64, but marginally. I say "surprisingly" 
since in 32 bit mode, you've only 8 XMM registers available, whereas 16 in 64 
bit mode.

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com