[Proj] Experiment to speed up proj.4 by 2 or more
even.rouault at spatialys.com
Tue Jun 23 12:57:00 EST 2015
Le mardi 23 juin 2015 19:26:45, Greg Troxel a écrit :
> Even Rouault <even.rouault at spatialys.com> writes:
> > José Luis,
> >> 1. Introducing C++: From my point of view, one of the PROJ best things
> >> is that it is written in pure C.
> > In which respect is the actual implementation of proj important to you as
> > a user, if you can still use it as a C library ? All nowadays C
> > compilers should be able to deal with C++ as well, and here I'm only
> > using C++98 features (I'd even say C++83 ;-)), not C++14.
> That's not the only question; there's also packaging and portability.
> It could be that people only run proj on desktop systems that already
> have C++.
Hum, sorry I don't understand this sentence.
> Wiuh respect to packaging, are you proposing to detect the SSE
> instructions at compile time, or at runtime,
You obviously need the 2 aspects:
- compile time: you must detect if the compiler supports generating SSE2 (or
AVX, AVX2, etc...), to decide which source files to compile and with which
- runtime: you must check the feature set supported by the running CPU and
select accordingly the appropriate entry points. In the case of a x86_64
binary, you know that SSE2 is supported. For other instruction sets such as
AVX and later, you must use cpuid + xgetbv to check OS support for AVX. (or
for SSE2 on i86).
- on x86_64, if the compiler supports AVX2, it would compile both the SSE2 and
AVX2 implementations. And proj would query the CPU&OS at runtime to see if
AVX2 is supported
- on i386, the non-SIMD, SSE2 and AVX2 implementations.
Possibly with options to disable some implementations at compile time.
- on obscure_architecture, just the non-SIMD implementation.
That requires care, but that's doable. I've done similar things in GDAL
(gdal_grid has some optimized implementation of the invdist algorithm for SSE
and AVX, with the above described mix of compile+runtime checks).
For other platforms, non SIMD code would still be used (SLEEF also supports
ARM Neon, so that could possibly be ported to non Intel platforms too). The
"Register" C++ class used would just encapsulate a single double value then.
> and how does this interact
> with building packages on one system that run on another.
That should work fine, even with cross compilers.
> Or is this
> only for amd64?
I tried my POC with gcc -m32 -msse2 and that works fine. Surprisingly
performance was just a bit below x86_64, but marginally. I say "surprisingly"
since in 32 bit mode, you've only 8 XMM registers available, whereas 16 in 64
Spatialys - Geospatial professional services
More information about the Proj