[FWTools] gdaltindex - performance issue with lots of files?

Ed McNierney ed at topozone.com
Fri Feb 1 21:29:44 EST 2008

Paul -

You may see some benefit by defragmenting that disk drive.  If the drive was created by copying those 33,672 files serially to it, the directory data may be scattered widely all over the disk.  Since disk seeks are the slowest things your disk can do, GDAL's behavior may be forcing your disk to seek all over itself to read that directory.    This can produce quite non-linear symptoms; if your 2,000-file example all fit in one directory allocation, it could be very fast.  If the 2,001st file  required a new directory block at the end of all those files, the performance could suddenly get much worse.

A defragmentation run with directory consolidation, so the directory is a single, contiguous file, might be helpful.

     - Ed

Ed McNierney
Chief Mapmaker
Demand Media / TopoZone.com
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
Phone: 978-251-4242, Fax: 978-251-1396
ed at topozone.com

-----Original Message-----
From: fwtools-bounces at lists.maptools.org [mailto:fwtools-bounces at lists.maptools.org] On Behalf Of Paul McCullough
Sent: Friday, February 01, 2008 8:54 PM
To: fwtools at lists.maptools.org
Subject: Re: [FWTools] gdaltindex - performance issue with lots of files?


Many thanks for the rapid reply. The drive is a local drive. It is an
external drive on a windows xp sp 2 box.
I can live with this for now. Can you provide an estimate when this will be
released in FWTools?


Frank Warmerdam-2 wrote:
> Paul McCullough wrote:
>> When I run 
>>     gdaltindex -tileindex location level0 *.tif
>> in a directory with 33672 tif files, the process takes about 24 hours.
>> While it does finish properly, 24 hours seems too long.
>> (windows xp - sp 4 - 2GB RAM; FWTools 2.0.2)
>> In other layers of my image pyramid, gdaltindex completes in what I
>> consider
>> reasonable times.
>> For example, another layer with about 2000 files, runs in about 90
>> seconds.
>> It appears that the increase in time with larger file counts is
>> non-linear.
>> I can collect more data if that would help but I suspect there is BigOh
>> kind
>> of problem here.
>> Also, it seems quite reasable to have 30000 tile polygons.
>> I have tried these forms of the cmd line on medium file counts and seen
>> little difference:
>>     gdaltindex -tileindex location test1 *.tif
>>     gdaltindex -tileindex location test2 --optfile files
>> I have not tried this unix cmd line form:
>>     find $PWD -name "*.tif" -exec gdaltindex srtm-index3 {} \;
> Paul,
> This is:
>    http://trac.osgeo.org/gdal/ticket/2158
> A work around is available now in trunk and will appear in 1.5.1. 
> Basically
> for GDAL 1.5 changes were made to read the file list from the directory
> when
> a file is opened and this turns out to be very very slow in some
> circumstances
> when there are a lot of files.  Would the directory in question happen to
> be
> on a network driver?
> If you need to work around this you can use an FWTools from early in 2007
> or older.
> Best regards,
> -- 
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam,
> warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | President OSGeo, http://osgeo.org
> _______________________________________________
> FWTools mailing list
> FWTools at lists.maptools.org
> http://lists.maptools.org/mailman/listinfo/fwtools
> http://fwtools.maptools.org/

View this message in context: http://www.nabble.com/gdaltindex---performance-issue-with-lots-of-files--tp15231855p15238723.html
Sent from the FWTools mailing list archive at Nabble.com.

FWTools mailing list
FWTools at lists.maptools.org

More information about the FWTools mailing list