Skip to content

Commit 42736df

Browse files
committed
docs: correct messaging and some numbers
1 parent 5271c30 commit 42736df

File tree

3 files changed

+35
-32
lines changed

3 files changed

+35
-32
lines changed

README.md

Lines changed: 26 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -23,18 +23,19 @@ requests** before a pixel moves.
2323

2424
Rasteret parses those headers **once**, caches them in Parquet, and its
2525
own reader fetches pixels concurrently with no GDAL in the path.
26-
**Over 20x faster** on cold starts.
26+
**Up to 20x faster** on cold starts.
2727

28-
- **Easy** - three lines from STAC search or Parquet file to TorchGeo dataset
28+
- **Easy** - three lines from STAC search or Parquet file to a TorchGeo-compatible dataset
2929
- **Zero downloads** - work with terabytes of imagery while storing only megabytes of metadata
3030
- **No STAC at training time** - query once at setup; zero API calls during training
3131
- **Reproducible** - same Parquet index = same records = same results
32-
- **Native dtypes** - uint16 stays uint16 in TorchGeo tensors; xarray promotes only when NaN fill requires it
32+
- **Native dtypes** - uint16 stays uint16 in tensors; xarray promotes only when NaN fill requires it
3333
- **Shareable cache** - a 5 MB index captures scene selection, band metadata, and split assignments
3434

35-
Rasteret is an **opt-in accelerator**. Your TorchGeo samplers, DataLoader,
36-
xarray workflows, and analysis tools stay the same - Rasteret handles the
37-
async tile I/O underneath.
35+
Rasteret is an **opt-in accelerator** that integrates with TorchGeo by
36+
returning a standard `GeoDataset`. Your samplers, DataLoader, xarray
37+
workflows, and analysis tools stay the same - Rasteret handles the async
38+
tile I/O underneath.
3839

3940
---
4041

@@ -186,28 +187,33 @@ Processing pipeline: Filter 450,000 scenes -> 22 matches -> Read 44 COG files
186187

187188
![Single request performance](./assets/single_timeseries_request.png)
188189

189-
### TorchGeo comparison (cold start)
190+
### Cold-start comparison with TorchGeo
190191

191-
Apples-to-apples: same AOIs, same scenes, same sampler, same DataLoader.
192-
Both paths output identical `[batch, T, C, H, W]` tensors.
193-
Cold-start numbers: no HTTP cache, no OS page cache, no pre-opened file handles.
192+
Same AOIs, same scenes, same sampler, same DataLoader. Both paths output
193+
identical `[batch, T, C, H, W]` tensors. TorchGeo runs with its
194+
recommended GDAL settings for best-case remote COG performance.
194195

195-
| Scenario | TorchGeo | Rasteret | Speedup |
196+
| Scenario | rasterio/GDAL path | Rasteret path | Ratio |
196197
|---|---|---|---|
197-
| Single AOI, 15 scenes | 9.08 s | 1.14 s | **8.0x** |
198-
| Multi-AOI, 30 scenes | 42.05 s | 2.25 s | **18.7x** |
199-
| Cross-CRS boundary, 12 scenes | 12.47 s | 0.59 s | **21.3x** |
198+
| Single AOI, 15 scenes | 9.08 s | 1.14 s | **8x** |
199+
| Multi-AOI, 30 scenes | 42.05 s | 2.25 s | **19x** |
200+
| Cross-CRS boundary, 12 scenes | 12.47 s | 0.59 s | **21x** |
201+
202+
The difference comes from how headers are accessed: the rasterio/GDAL
203+
path re-parses IFDs over HTTP on each cold start, while Rasteret reads
204+
them from a local Parquet cache. See
205+
[Benchmarks](https://terrafloww.github.io/rasteret/explanation/benchmark/)
206+
for full methodology.
200207

201208
![Processing time comparison](./assets/benchmark_results.png)
202209
![Speedup breakdown](./assets/benchmark_breakdown.png)
203210

204-
Full methodology: [Benchmarks](https://terrafloww.github.io/rasteret/explanation/benchmark/)
205-
· Notebook: [`05_torchgeo_comparison.ipynb`](docs/tutorials/05_torchgeo_comparison.ipynb)
206-
· Blog: [blog.terrafloww.com](https://blog.terrafloww.com/rasteret-a-library-for-faster-and-cheaper-open-satellite-data-access/)
211+
Notebook: [`05_torchgeo_comparison.ipynb`](docs/tutorials/05_torchgeo_comparison.ipynb)
207212

208-
> [!IMPORTANT]
209-
> Measured on 12-30 Sentinel-2 scenes. The speedup grows with scene count.
210-
> If you run Rasteret on larger workloads, share your numbers on
213+
> [!NOTE]
214+
> Measured on 12-30 Sentinel-2 scenes on an EC2 instance in the same
215+
> region as the data (us-west-2). Results vary with network conditions.
216+
> If you run Rasteret on your own workloads, share your numbers on
211217
> [GitHub Discussions](https://github.com/terrafloww/rasteret/discussions/categories/show-and-tell)
212218
> or [Discord](https://discord.gg/V5vvuEBc).
213219

docs/explanation/benchmark.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -62,22 +62,19 @@ before pixel reads begin.
6262

6363
## Key observations
6464

65-
1. **Rasteret is 8-21x faster than TorchGeo** for remote COG
66-
time-series reads (speedup grows with scene count).
67-
2. **The speedup grows with scene count**: more scenes = more benefit.
68-
TorchGeo's overhead scales linearly (sequential HTTP per file)
69-
while Rasteret's concurrent reads are bandwidth-bound.
70-
3. **Cross-CRS adds reprojection overhead** (~0.1-0.3 s) to Rasteret, but
71-
it is still much faster than TorchGeo's WarpedVRT approach.
65+
1. The difference grows with scene count: the rasterio/GDAL path
66+
re-parses headers over HTTP per file (sequential), while Rasteret
67+
reads cached headers from disk and fetches pixels concurrently.
68+
2. Cross-CRS adds reprojection overhead (~0.1-0.3 s) to both paths.
7269

73-
## Why Rasteret is faster
70+
## Where the difference comes from
7471

75-
| | TorchGeo | Rasteret |
72+
| | rasterio/GDAL path | Rasteret path |
7673
|-|----------|---------|
7774
| **Index** | `rasterio.open()` per COG over HTTP | Pre-built GeoParquet (disk read) |
7875
| **Time-series read** | Sequential `rasterio.merge()` per timestep | All T timesteps via `asyncio.gather` |
7976
| **HTTP per timestep** | HEAD + IFD + pixel ranges | Pixel ranges only (headers cached) |
80-
| **Concurrency** | None (GDAL reads are serial) | `asyncio.gather` across T x C reads |
77+
| **Concurrency** | Sequential | `asyncio.gather` across T x C reads |
8178

8279
## Reproducibility
8380

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Rasteret
22

3-
**Made to beat cold starts.** Index-first access to cloud-native GeoTIFF collections for ML and analysis.
3+
**Made to beat cold starts.** Index-first access to cloud-native GeoTIFF collections for ML and geospatial analysis.
44

55
---
66

@@ -17,7 +17,7 @@
1717
!!! success "What Rasteret does"
1818

1919
Parse headers **once**, cache in Parquet, read pixels concurrently
20-
with no GDAL in the path. **Over 20x faster** on cold starts.
20+
with no GDAL in the path.
2121

2222
```text
2323
STAC API / GeoParquet --> Parquet Index --> Tile-level byte reads

0 commit comments

Comments
 (0)