Skip to content

Commit 859d44e

Browse files
committed
docs: add missing information to readme and changelog and mkdocs index
1 parent 85096c8 commit 859d44e

File tree

3 files changed

+33
-14
lines changed

3 files changed

+33
-14
lines changed

README.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ own reader fetches pixels concurrently with no GDAL in the path.
3030
- **No STAC at training time** - query once at setup; zero API calls during training
3131
- **Reproducible** - same Parquet index = same records = same results
3232
- **Native dtypes** - uint16 stays uint16; no silent float32 promotion in the read path
33-
- **Shareable** - a 5 MB index captures scene selection, band metadata, and split assignments
33+
- **Shareable cache** - a 5 MB index captures scene selection, band metadata, and split assignments
3434

3535
Rasteret is an **opt-in accelerator**. Your TorchGeo samplers, DataLoader,
3636
xarray workflows, and analysis tools stay the same - Rasteret handles the
@@ -121,6 +121,21 @@ collection = rasteret.build(
121121
COG headers, and caches everything as Parquet. The next run loads in
122122
milliseconds.
123123

124+
### Inspect and filter
125+
126+
```python
127+
collection # Collection('s2_training', source='sentinel-2-l2a', bands=13, records=47, crs=32643)
128+
collection.bands # ['B01', 'B02', ..., 'B12', 'SCL']
129+
len(collection) # 47
130+
131+
132+
# Filter in memory — no network calls
133+
filtered = collection.subset(cloud_cover_lt=15, date_range=("2024-03-01", "2024-06-01"))
134+
```
135+
136+
`subset()` accepts `cloud_cover_lt`, `date_range`, `bbox`, `geometries`, and
137+
`split`. For raw Arrow expressions, use `collection.where(expr)`.
138+
124139
### ML training (TorchGeo)
125140

126141
```python
@@ -157,7 +172,7 @@ ndvi = (ds.B08 - ds.B04) / (ds.B08 + ds.B04)
157172
| Multi-band COGs (AEF embeddings, etc.) | [AEF Embeddings guide](https://terrafloww.github.io/rasteret/how-to/aef-embeddings/) |
158173
| Authenticated sources (PC, requester-pays, Earthdata, etc.) | [Custom Cloud Provider](https://terrafloww.github.io/rasteret/how-to/custom-cloud-provider/) |
159174
| Share a Collection | `collection.export("path/")` then `rasteret.load("path/")` |
160-
| Filter by cloud cover, date, bbox | [`collection.subset()`](https://terrafloww.github.io/rasteret/tutorials/) |
175+
| Filter by cloud cover, date, bbox | [`collection.subset()`](https://terrafloww.github.io/rasteret/how-to/collection-management/) |
161176

162177
</details>
163178

docs/changelog.md

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,13 @@
1616
- **Local catalog persistence**: `register_local()` persists to
1717
`~/.rasteret/datasets.local.json`; `export_local_descriptor()` for
1818
sharing catalog entries alongside Collections.
19+
- **Torchgeo GeoDataset**: Adapter created that use rasteret's own I/O parts to create a Torchgeo
20+
GeoDataset.
1921
- **Native dtype preservation**: COG tiles return in their source dtype (uint16, int8,
2022
float32, etc.). No forced float32 conversion.
2123
- **Rasterio-aligned masking defaults**: AOI reads now default to `all_touched=False`
2224
and fill masked/outside-coverage pixels with `nodata` when present, otherwise `0`.
23-
The primary read API (`read_cog`) returns a `valid_mask` so ML pipelines can avoid
24-
learning from filled pixels.
25+
The primary read API (`read_cog`) returns a `valid_mask`.
2526
- **rioxarray removed**: CRS encoding uses pyproj CF conventions directly (WKT2, PROJJSON,
2627
GeoTransform). The `xarray` extra no longer pulls rioxarray.
2728
- **Extended TIFF header parsing**: nodata, SamplesPerPixel, PlanarConfiguration,
@@ -32,14 +33,24 @@
3233
reproject to the most common CRS. Cross-CRS reprojection uses GDAL's
3334
`calculate_default_transform` for correct resolution handling.
3435

36+
37+
### Collection API
38+
39+
- **Collection inspection**: `.bands`, `.bounds`, `.epsg`, `len()`, `__repr__()`,
40+
`.describe()`, `.compare_to_catalog()` for quick metadata access without
41+
materializing the full table.
42+
- **Filtering**: `collection.subset(cloud_cover_lt=..., date_range=..., bbox=...,
43+
geometries=..., split=...)` for friendly filtering; `collection.where(expr)` for
44+
raw Arrow dataset expressions. `select_split()` convenience wrapper.
45+
- **Sharing**: `collection.export("path/")` writes a portable copy;
46+
`rasteret.load("path/")` reloads it.
47+
3548
### Other changes
3649

3750
- Arrow-native geometry internals (GeoArrow replaces Shapely in hot paths).
3851
- obstore as base dependency for Rust-native HTTP backend.
3952
- CLI: `rasteret collections build|list|info|delete|import`, `rasteret build` shortcut.
4053
- CLI: `rasteret datasets list|info|build|register-local|export-local|unregister-local`.
41-
- Polished documentation, tutorials, and example scripts.
42-
- CI workflow fixes and public repo cleanup.
4354

4455
### Tested
4556

@@ -49,14 +60,6 @@
4960
as the oracle, matching TorchGeo's own read semantics. See
5061
`test_dataset_pixel_comparison.py` and `test_network_smoke.py`.
5162

52-
### Stability
53-
54-
- STAC + COG scene workflows: **stable**
55-
- Multi-cloud (S3, Azure Blob, GCS): stable
56-
- Dataset catalog: stable
57-
- TorchGeo adapter: **stable** (upgraded from experimental)
58-
- Non-STAC / record-table ingestion (build_from_table): stable
59-
6063
### Breaking changes
6164

6265
- `get_xarray()` returns data in native COG dtype instead of always float32. Code that

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ import rasteret
9999

100100
# 1. Build index (one-time, cached)
101101
collection = rasteret.build("earthsearch/sentinel-2-l2a", name="s2", bbox=(...), date_range=(...))
102+
collection.bands # ['B01', 'B02', ..., 'B12', 'SCL']
102103

103104
# 2. Filter metadata (in-memory, instant)
104105
sub = collection.subset(cloud_cover_lt=20, date_range=("2024-03-01", "2024-06-01"))

0 commit comments

Comments
 (0)