You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: clarify Rasteret's custom IO layer and obstore's transport role
obstore is the HTTP transport for multi-cloud URL routing (S3/GCS/Azure),
not the source of read performance. Performance comes from the index-first
approach: pre-cached tile offsets in Parquet, no header round-trips, and
asyncio concurrency across scenes and bands.
Updated: design-decisions, architecture, benchmark, custom-cloud-provider,
changelog, notebooks/README, COGReader docstring.
Signed-off-by: print-sid8 sidsub94@gmail.com
Copy file name to clipboardExpand all lines: README.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,19 +25,21 @@ Rasteret parses those headers **once**, caches them in Parquet, and its
25
25
own reader fetches pixels concurrently with no GDAL in the path.
26
26
**Up to 20x faster** on cold starts.
27
27
28
-
We call this pattern **index-first geospatial image retrieval**:
28
+
Rasteret calls this pattern **index-first geospatial retrieval**:
29
29
30
30
-**Control plane**: a queryable Parquet index (scene metadata, COG header metadata, user columns like splits/labels)
31
31
-**Data plane**: on-demand tile reads from the original GeoTIFF/COG objects
32
32
33
33
This keeps metadata and experiment logic in tables while leaving imagery bytes in source COGs.
34
34
35
+
Key Features -
35
36
-**Easy** - three lines from STAC search or Parquet file to a TorchGeo-compatible dataset
36
-
-**Zero downloads** - work with terabytes of imagery while storing only megabytes of metadata
37
-
-**No STAC at training time** - query once at setup; zero API calls during training
38
-
-**Reproducible** - same Parquet index = same records = same results
39
-
-**Native dtypes** - uint16 stays uint16 in tensors; xarray promotes only when NaN fill requires it
40
-
-**Shareable cache** - a few MB index can capture scene selection, band metadata, and split assignments
37
+
-**20x faster, saves cloud LISTs and GETs** - Our custom IO gets chunks of images fast, and costs no overhead a Collection is built
38
+
-**Zero data downloads** - work with terabytes of imagery while storing only megabytes of metadata.
39
+
-**No STAC at training time** - query once at setup; zero API calls during training with Collection you can extend.
40
+
-**Reproducible** - same Parquet index = same records = same results
41
+
-**Native dtypes** - In our IO image chunks of uint16 stays uint16 in tensors; only xarray conversion promotes to float32 to fill NaNs
42
+
-**Shareable cache** - enrich our Collection with your ML splits, patch geometries, custom data points for ML, and share it, don't write folders of image chips!
41
43
42
44
Rasteret is an **opt-in accelerator** that integrates with TorchGeo by
43
45
returning a standard `GeoDataset`. Your samplers, DataLoader, xarray
@@ -85,6 +87,7 @@ Rasteret ships with a growing catalog of datasets. Pick an ID and go:
85
87
```
86
88
$ rasteret datasets list
87
89
ID Name Coverage License Auth
90
+
aef/v1-annual AlphaEarth Foundation Embeddings (Annual) global CC-BY-4.0 none
88
91
earthsearch/sentinel-2-l2a Sentinel-2 Level-2A global proprietary(free) none
89
92
earthsearch/landsat-c2-l2 Landsat Collection 2 Level-2 global proprietary(free) required
0 commit comments