@@ -23,18 +23,19 @@ requests** before a pixel moves.
2323
2424Rasteret parses those headers ** once** , caches them in Parquet, and its
2525own reader fetches pixels concurrently with no GDAL in the path.
26- ** Over 20x faster** on cold starts.
26+ ** Up to 20x faster** on cold starts.
2727
28- - ** Easy** - three lines from STAC search or Parquet file to TorchGeo dataset
28+ - ** Easy** - three lines from STAC search or Parquet file to a TorchGeo-compatible dataset
2929- ** Zero downloads** - work with terabytes of imagery while storing only megabytes of metadata
3030- ** No STAC at training time** - query once at setup; zero API calls during training
3131- ** Reproducible** - same Parquet index = same records = same results
32- - ** Native dtypes** - uint16 stays uint16 in TorchGeo tensors; xarray promotes only when NaN fill requires it
32+ - ** Native dtypes** - uint16 stays uint16 in tensors; xarray promotes only when NaN fill requires it
3333- ** Shareable cache** - a 5 MB index captures scene selection, band metadata, and split assignments
3434
35- Rasteret is an ** opt-in accelerator** . Your TorchGeo samplers, DataLoader,
36- xarray workflows, and analysis tools stay the same - Rasteret handles the
37- async tile I/O underneath.
35+ Rasteret is an ** opt-in accelerator** that integrates with TorchGeo by
36+ returning a standard ` GeoDataset ` . Your samplers, DataLoader, xarray
37+ workflows, and analysis tools stay the same - Rasteret handles the async
38+ tile I/O underneath.
3839
3940---
4041
@@ -186,28 +187,33 @@ Processing pipeline: Filter 450,000 scenes -> 22 matches -> Read 44 COG files
186187
187188![ Single request performance] ( ./assets/single_timeseries_request.png )
188189
189- ### TorchGeo comparison (cold start)
190+ ### Cold-start comparison with TorchGeo
190191
191- Apples-to-apples: same AOIs, same scenes, same sampler, same DataLoader.
192- Both paths output identical ` [batch, T, C, H, W] ` tensors.
193- Cold-start numbers: no HTTP cache, no OS page cache, no pre-opened file handles .
192+ Same AOIs, same scenes, same sampler, same DataLoader. Both paths output
193+ identical ` [batch, T, C, H, W] ` tensors. TorchGeo runs with its
194+ recommended GDAL settings for best-case remote COG performance .
194195
195- | Scenario | TorchGeo | Rasteret | Speedup |
196+ | Scenario | rasterio/GDAL path | Rasteret path | Ratio |
196197| ---| ---| ---| ---|
197- | Single AOI, 15 scenes | 9.08 s | 1.14 s | ** 8.0x** |
198- | Multi-AOI, 30 scenes | 42.05 s | 2.25 s | ** 18.7x** |
199- | Cross-CRS boundary, 12 scenes | 12.47 s | 0.59 s | ** 21.3x** |
198+ | Single AOI, 15 scenes | 9.08 s | 1.14 s | ** 8x** |
199+ | Multi-AOI, 30 scenes | 42.05 s | 2.25 s | ** 19x** |
200+ | Cross-CRS boundary, 12 scenes | 12.47 s | 0.59 s | ** 21x** |
201+
202+ The difference comes from how headers are accessed: the rasterio/GDAL
203+ path re-parses IFDs over HTTP on each cold start, while Rasteret reads
204+ them from a local Parquet cache. See
205+ [ Benchmarks] ( https://terrafloww.github.io/rasteret/explanation/benchmark/ )
206+ for full methodology.
200207
201208![ Processing time comparison] ( ./assets/benchmark_results.png )
202209![ Speedup breakdown] ( ./assets/benchmark_breakdown.png )
203210
204- Full methodology: [ Benchmarks] ( https://terrafloww.github.io/rasteret/explanation/benchmark/ )
205- · Notebook: [ ` 05_torchgeo_comparison.ipynb ` ] ( docs/tutorials/05_torchgeo_comparison.ipynb )
206- · Blog: [ blog.terrafloww.com] ( https://blog.terrafloww.com/rasteret-a-library-for-faster-and-cheaper-open-satellite-data-access/ )
211+ Notebook: [ ` 05_torchgeo_comparison.ipynb ` ] ( docs/tutorials/05_torchgeo_comparison.ipynb )
207212
208- > [ !IMPORTANT]
209- > Measured on 12-30 Sentinel-2 scenes. The speedup grows with scene count.
210- > If you run Rasteret on larger workloads, share your numbers on
213+ > [ !NOTE]
214+ > Measured on 12-30 Sentinel-2 scenes on an EC2 instance in the same
215+ > region as the data (us-west-2). Results vary with network conditions.
216+ > If you run Rasteret on your own workloads, share your numbers on
211217> [ GitHub Discussions] ( https://github.com/terrafloww/rasteret/discussions/categories/show-and-tell )
212218> or [ Discord] ( https://discord.gg/V5vvuEBc ) .
213219
0 commit comments