Skip to content

Commit b4ea956

Browse files
committed
updated change log
1 parent f87128a commit b4ea956

File tree

1 file changed

+54
-8
lines changed

1 file changed

+54
-8
lines changed

docs/changelog.md

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,30 @@
11
# Changelog
22

3-
## v0.3.0
3+
## v0.3.0 (unreleased)
44

55
### Highlights
66

77
- License changed from AGPL-3.0-only to **Apache-2.0**.
8-
- **Dataset catalog**: `build()` with 13 pre-registered datasets across
8+
- **Dataset catalog**: `build()` with 12 pre-registered datasets across
99
Earth Search, Planetary Computer, and AlphaEarth Foundation.
10+
Catalog entries can point to a STAC API or a GeoParquet file.
1011
`register_local()` for adding your own.
12+
- **`build_from_stac()`** and **`build_from_table()`**: build a Collection
13+
from any STAC API or any Parquet/GeoParquet file with COG URLs (Source
14+
Cooperative exports, STAC GeoParquet, custom catalogs). No STAC API
15+
required for the table path. Optional `enrich_cog=True` parses COG
16+
headers for accelerated reads.
1117
- **Multi-cloud obstore backend**: S3, Azure Blob, and GCS routing via URL
1218
auto-detection, with automatic fallback to anonymous access.
1319
- **`create_backend()`** for authenticated reads with obstore credential
1420
providers (e.g., Planetary Computer SAS tokens).
1521
- **TorchGeo adapter**: `collection.to_torchgeo_dataset()` returns a
1622
`GeoDataset` backed by Rasteret's async COG reader. Supports
17-
`time_series=True` (`[T, C, H, W]` output), multi-CRS reprojection,
18-
and works with all TorchGeo samplers and collation helpers.
23+
`time_series=True` (`[T, C, H, W]` output), `label_field` for
24+
per-sample labels, `target_crs` for cross-CRS reprojection,
25+
`allow_resample=True` for mixed-resolution bands, and `is_image=False`
26+
for mask-style datasets. Works with all TorchGeo samplers, collation
27+
helpers, and dataset composition (`IntersectionDataset`, `UnionDataset`).
1928
- **Native dtype preservation**: COG tiles return in their source dtype
2029
(uint16, int8, float32, etc.) instead of forcing float32.
2130
- **Rasterio-aligned masking**: AOI reads default to `all_touched=False`
@@ -29,16 +38,41 @@
2938
- **Multi-CRS auto-reprojection**: queries spanning multiple UTM zones
3039
reproject to the most common CRS using GDAL's
3140
`calculate_default_transform`.
41+
- **`get_numpy()`**: lightweight NumPy output path returning `[N, H, W]`
42+
(single band) or `[N, C, H, W]` (multi-band) arrays. No extra
43+
dependencies beyond NumPy. Accepts bbox tuples, Arrow arrays, Shapely
44+
objects, or raw WKB.
45+
- **`get_gdf()`**: GeoDataFrame output path for analysis workflows.
46+
- **Enriched Parquet workflows**: append arbitrary columns (splits, labels,
47+
AOI polygons, model scores) to a Collection's Parquet, query with
48+
DuckDB/PyArrow, and fetch pixels for matching rows on demand. See
49+
[Enriched Parquet Workflows](how-to/enriched-parquet-workflows.md).
50+
- **Major TOM on-the-fly**: example workflow rebuilding Major TOM-style
51+
patch-grid semantics from source Sentinel-2 COGs instead of
52+
payload-in-Parquet. Benchmarked 3.9-6.5x faster than HF `datasets`
53+
Parquet-filter reads.
54+
- **`earthdata` optional extra**: `pip install rasteret[earthdata]` for
55+
NASA Earthdata auto-credential detection.
3256

3357
### Collection API
3458

59+
- **Output paths**: `get_xarray()`, `get_numpy()`, `get_gdf()`,
60+
`to_torchgeo_dataset()`. All share the same async tile I/O underneath.
3561
- **Inspection**: `.bands`, `.bounds`, `.epsg`, `len()`, `__repr__()`,
3662
`.describe()`, `.compare_to_catalog()`.
3763
- **Filtering**: `collection.subset(cloud_cover_lt=..., date_range=...,
3864
bbox=..., geometries=..., split=...)` and `collection.where(expr)` for
3965
raw Arrow expressions.
4066
- **Sharing**: `collection.export("path/")` writes a portable copy;
41-
`rasteret.load("path/")` reloads it.
67+
`rasteret.load("path/")` reloads it. `list_collections()` discovers
68+
cached collections in the workspace.
69+
- **Three-tier schema**: required columns (`id`, `datetime`, `geometry`,
70+
`assets`), COG acceleration columns (per-band tile offsets and metadata),
71+
and user-extensible columns (`split`, `label`, `cloud_cover`, custom
72+
metadata). See [Schema Contract](explanation/schema-contract.md).
73+
- **Public exports**: `Collection`, `CloudConfig`, `BandRegistry`,
74+
`DatasetDescriptor`, `DatasetRegistry` are all importable from
75+
`rasteret`.
4276

4377
### Other changes
4478

@@ -49,12 +83,24 @@
4983
- TorchGeo `time_series=True` uses spatial-only intersection, matching
5084
TorchGeo's own `RasterDataset` behaviour where all spatially overlapping
5185
records are stacked regardless of the sampler's time slice.
86+
- Cloud workspace URIs (e.g. `s3://bucket/path`) are preserved correctly
87+
in `CollectionBuilder` base class.
5288

5389
### Tested
5490

55-
- All three output paths (xarray, GeoDataFrame, TorchGeo) tested against
56-
direct rasterio reads across 12 datasets (Sentinel-2, Landsat, NAIP,
57-
Copernicus DEM, ESA WorldCover, AEF, and more).
91+
- All four output paths (xarray, GeoDataFrame, NumPy, TorchGeo) tested
92+
against direct rasterio reads across 12 datasets (Sentinel-2, Landsat,
93+
NAIP, Copernicus DEM, ESA WorldCover, AEF, and more).
94+
- TorchGeo adapter verified against the full GeoDataset contract:
95+
`IntervalIndex`, samplers, collation, dataset composition, cross-CRS
96+
reprojection, and export/reload roundtrips.
97+
98+
### Requirements
99+
100+
- **Python 3.12+** required.
101+
- `rasterio>=1.4.3,<1.5.0` is a core dependency (used for geometry
102+
masking, CRS reprojection, and TorchGeo query-grid placement; not in
103+
the tile-read path).
58104

59105
### Breaking changes
60106

0 commit comments

Comments
 (0)