You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: clarify Rasteret's custom IO layer and obstore's transport role
obstore is the HTTP transport for multi-cloud URL routing (S3/GCS/Azure),
not the source of read performance. Performance comes from the index-first
approach: pre-cached tile offsets in Parquet, no header round-trips, and
asyncio concurrency across scenes and bands.
Updated: design-decisions, architecture, benchmark, custom-cloud-provider,
changelog, notebooks/README, COGReader docstring.
Signed-off-by: print-sid8 sidsub94@gmail.com
This keeps metadata and experiment logic in tables while leaving imagery bytes in source COGs.
34
34
35
-
Key Features -
35
+
Key Features -
36
36
-**Easy** - three lines from STAC search or Parquet file to a TorchGeo-compatible dataset
37
-
-**20x faster, saves cloud LISTs and GETs** - Our custom IO gets chunks of images fast, and costs no overhead a Collection is built
37
+
-**20x faster, saves cloud LISTs and GETs** - Our custom IO gets chunks of images fast, and costs no overhead a Collection is built
38
38
-**Zero data downloads** - work with terabytes of imagery while storing only megabytes of metadata.
39
39
-**No STAC at training time** - query once at setup; zero API calls during training with Collection you can extend.
40
-
-**Reproducible** - same Parquet index = same records = same results
40
+
-**Reproducible** - same Parquet index = same records = same results
41
41
-**Native dtypes** - In our IO image chunks of uint16 stays uint16 in tensors; only xarray conversion promotes to float32 to fill NaNs
42
42
-**Shareable cache** - enrich our Collection with your ML splits, patch geometries, custom data points for ML, and share it, don't write folders of image chips!
43
43
@@ -82,8 +82,11 @@ See [Getting Started](https://terrafloww.github.io/rasteret/getting-started/) fo
82
82
83
83
## Built-in datasets
84
84
85
-
Rasteret ships with a growing catalog of datasets. Pick an ID and go:
85
+
Rasteret ships with a growing catalog of datasets.
86
+
Each entry includes license metadata and a `commercial_use` flag for quick
87
+
filtering.
86
88
89
+
Pick an ID, pass it to `build()` and go:
87
90
```
88
91
$ rasteret datasets list
89
92
ID Name Coverage License Auth
@@ -101,17 +104,19 @@ pc/esa-worldcover ESA WorldCover global
101
104
pc/usda-cdl USDA Cropland Data Layer conus proprietary(free) required
102
105
```
103
106
104
-
Each entry includes license metadata and a `commercial_use` flag for quick
105
-
filtering.
106
107
107
-
The catalog is open and community-driven. Each entry is ~20 lines of
108
-
Python pointing to a STAC API or a GeoParquet file. One PR adds a dataset,
109
-
every user gets access on the next release.
110
108
111
-
Pick any ID and pass it to `build()`. Don't see your dataset? Use
112
-
`build_from_stac()` for any STAC API, `build_from_table()` for existing
113
-
Parquet, or [add it to the catalog](https://terrafloww.github.io/rasteret/how-to/dataset-catalog/#add-your-own-catalog-entries-advanced)
114
-
so everyone benefits.
109
+
## Use your own datasets
110
+
- Use `build_from_stac()` for any STAC API
111
+
- Use `build_from_table()` for Parquets that have TIFF URLs in them (eg., SourceCoop AlphaEarth index parquet)
112
+
113
+
You can also build collections using CLI `rasteret collections build` read more details [here](https://terrafloww.github.io/rasteret/how-to/collection-management/)
114
+
115
+
[Here's a guide to add a dataset to rasteret's catalog](https://terrafloww.github.io/rasteret/how-to/dataset-catalog/#add-your-own-catalog-entries-advanced)
116
+
so everyone benefits. The catalog is open to edit by anyone and will be community-driven.
117
+
118
+
Each new dataset entry is around ~20 lines of Python pointing to a STAC API or a GeoParquet file.
119
+
One PR adds a dataset, every rasteret user sees it in `rasteret datasets list` on the next release of rasteret.
0 commit comments