|
1 | 1 | # Changelog |
2 | 2 |
|
3 | | -## v0.3.0 |
| 3 | +## v0.3.0 (unreleased) |
4 | 4 |
|
5 | 5 | ### Highlights |
6 | 6 |
|
7 | 7 | - License changed from AGPL-3.0-only to **Apache-2.0**. |
8 | | -- **Dataset catalog**: `build()` with 13 pre-registered datasets across |
| 8 | +- **Dataset catalog**: `build()` with 12 pre-registered datasets across |
9 | 9 | Earth Search, Planetary Computer, and AlphaEarth Foundation. |
| 10 | + Catalog entries can point to a STAC API or a GeoParquet file. |
10 | 11 | `register_local()` for adding your own. |
| 12 | +- **`build_from_stac()`** and **`build_from_table()`**: build a Collection |
| 13 | + from any STAC API or any Parquet/GeoParquet file with COG URLs (Source |
| 14 | + Cooperative exports, STAC GeoParquet, custom catalogs). No STAC API |
| 15 | + required for the table path. Optional `enrich_cog=True` parses COG |
| 16 | + headers for accelerated reads. |
11 | 17 | - **Multi-cloud obstore backend**: S3, Azure Blob, and GCS routing via URL |
12 | 18 | auto-detection, with automatic fallback to anonymous access. |
13 | 19 | - **`create_backend()`** for authenticated reads with obstore credential |
14 | 20 | providers (e.g., Planetary Computer SAS tokens). |
15 | 21 | - **TorchGeo adapter**: `collection.to_torchgeo_dataset()` returns a |
16 | 22 | `GeoDataset` backed by Rasteret's async COG reader. Supports |
17 | | - `time_series=True` (`[T, C, H, W]` output), multi-CRS reprojection, |
18 | | - and works with all TorchGeo samplers and collation helpers. |
| 23 | + `time_series=True` (`[T, C, H, W]` output), `label_field` for |
| 24 | + per-sample labels, `target_crs` for cross-CRS reprojection, |
| 25 | + `allow_resample=True` for mixed-resolution bands, and `is_image=False` |
| 26 | + for mask-style datasets. Works with all TorchGeo samplers, collation |
| 27 | + helpers, and dataset composition (`IntersectionDataset`, `UnionDataset`). |
19 | 28 | - **Native dtype preservation**: COG tiles return in their source dtype |
20 | 29 | (uint16, int8, float32, etc.) instead of forcing float32. |
21 | 30 | - **Rasterio-aligned masking**: AOI reads default to `all_touched=False` |
|
29 | 38 | - **Multi-CRS auto-reprojection**: queries spanning multiple UTM zones |
30 | 39 | reproject to the most common CRS using GDAL's |
31 | 40 | `calculate_default_transform`. |
| 41 | +- **`get_numpy()`**: lightweight NumPy output path returning `[N, H, W]` |
| 42 | + (single band) or `[N, C, H, W]` (multi-band) arrays. No extra |
| 43 | + dependencies beyond NumPy. Accepts bbox tuples, Arrow arrays, Shapely |
| 44 | + objects, or raw WKB. |
| 45 | +- **`get_gdf()`**: GeoDataFrame output path for analysis workflows. |
| 46 | +- **Enriched Parquet workflows**: append arbitrary columns (splits, labels, |
| 47 | + AOI polygons, model scores) to a Collection's Parquet, query with |
| 48 | + DuckDB/PyArrow, and fetch pixels for matching rows on demand. See |
| 49 | + [Enriched Parquet Workflows](how-to/enriched-parquet-workflows.md). |
| 50 | +- **Major TOM on-the-fly**: example workflow rebuilding Major TOM-style |
| 51 | + patch-grid semantics from source Sentinel-2 COGs instead of |
| 52 | + payload-in-Parquet. Benchmarked 3.9-6.5x faster than HF `datasets` |
| 53 | + Parquet-filter reads. |
| 54 | +- **`earthdata` optional extra**: `pip install rasteret[earthdata]` for |
| 55 | + NASA Earthdata auto-credential detection. |
32 | 56 |
|
33 | 57 | ### Collection API |
34 | 58 |
|
| 59 | +- **Output paths**: `get_xarray()`, `get_numpy()`, `get_gdf()`, |
| 60 | + `to_torchgeo_dataset()`. All share the same async tile I/O underneath. |
35 | 61 | - **Inspection**: `.bands`, `.bounds`, `.epsg`, `len()`, `__repr__()`, |
36 | 62 | `.describe()`, `.compare_to_catalog()`. |
37 | 63 | - **Filtering**: `collection.subset(cloud_cover_lt=..., date_range=..., |
38 | 64 | bbox=..., geometries=..., split=...)` and `collection.where(expr)` for |
39 | 65 | raw Arrow expressions. |
40 | 66 | - **Sharing**: `collection.export("path/")` writes a portable copy; |
41 | | - `rasteret.load("path/")` reloads it. |
| 67 | + `rasteret.load("path/")` reloads it. `list_collections()` discovers |
| 68 | + cached collections in the workspace. |
| 69 | +- **Three-tier schema**: required columns (`id`, `datetime`, `geometry`, |
| 70 | + `assets`), COG acceleration columns (per-band tile offsets and metadata), |
| 71 | + and user-extensible columns (`split`, `label`, `cloud_cover`, custom |
| 72 | + metadata). See [Schema Contract](explanation/schema-contract.md). |
| 73 | +- **Public exports**: `Collection`, `CloudConfig`, `BandRegistry`, |
| 74 | + `DatasetDescriptor`, `DatasetRegistry` are all importable from |
| 75 | + `rasteret`. |
42 | 76 |
|
43 | 77 | ### Other changes |
44 | 78 |
|
|
49 | 83 | - TorchGeo `time_series=True` uses spatial-only intersection, matching |
50 | 84 | TorchGeo's own `RasterDataset` behaviour where all spatially overlapping |
51 | 85 | records are stacked regardless of the sampler's time slice. |
| 86 | +- Cloud workspace URIs (e.g. `s3://bucket/path`) are preserved correctly |
| 87 | + in `CollectionBuilder` base class. |
52 | 88 |
|
53 | 89 | ### Tested |
54 | 90 |
|
55 | | -- All three output paths (xarray, GeoDataFrame, TorchGeo) tested against |
56 | | - direct rasterio reads across 12 datasets (Sentinel-2, Landsat, NAIP, |
57 | | - Copernicus DEM, ESA WorldCover, AEF, and more). |
| 91 | +- All four output paths (xarray, GeoDataFrame, NumPy, TorchGeo) tested |
| 92 | + against direct rasterio reads across 12 datasets (Sentinel-2, Landsat, |
| 93 | + NAIP, Copernicus DEM, ESA WorldCover, AEF, and more). |
| 94 | +- TorchGeo adapter verified against the full GeoDataset contract: |
| 95 | + `IntervalIndex`, samplers, collation, dataset composition, cross-CRS |
| 96 | + reprojection, and export/reload roundtrips. |
| 97 | + |
| 98 | +### Requirements |
| 99 | + |
| 100 | +- **Python 3.12+** required. |
| 101 | +- `rasterio>=1.4.3,<1.5.0` is a core dependency (used for geometry |
| 102 | + masking, CRS reprojection, and TorchGeo query-grid placement; not in |
| 103 | + the tile-read path). |
58 | 104 |
|
59 | 105 | ### Breaking changes |
60 | 106 |
|
|
0 commit comments