docs: improve

print-sid8 · print-sid8 · commit 25d778b9b4cc · 2026-02-27T08:50:46.000Z
diff --git a/README.md b/README.md
@@ -25,17 +25,22 @@ Rasteret parses those headers **once**, caches them in Parquet, and its
 own reader fetches pixels concurrently with no GDAL in the path.
 **Up to 20x faster** on cold starts.
 
+Because the index is Parquet, it's not just a cache - it's a table you
+work with. Filter by cloud cover or date range, join with your own labels
+or AOI polygons, add train/val/test splits as columns, query with DuckDB
+or PyArrow. When you need pixels, Rasteret fetches them on demand from the
+same table.
+
 - **Easy** - three lines from STAC search or Parquet file to a TorchGeo-compatible dataset
 - **Zero downloads** - work with terabytes of imagery while storing only megabytes of metadata
 - **No STAC at training time** - query once at setup; zero API calls during training
 - **Reproducible** - same Parquet index = same records = same results
 - **Native dtypes** - uint16 stays uint16 in tensors; xarray promotes only when NaN fill requires it
-- **Shareable cache** - a few MB index can capture scene selection, band metadata, and split assignments
+- **Your dataset is a table** - filter, enrich, version, and share a few MB Parquet file. The selection logic lives next to the data references.
 
-Rasteret is an **opt-in accelerator** that integrates with TorchGeo by
-returning a standard `GeoDataset`. Your samplers, DataLoader, xarray
-workflows, and analysis tools stay the same - Rasteret handles the async
-tile I/O underneath.
+Rasteret integrates with TorchGeo by returning a standard `GeoDataset`.
+Your samplers, DataLoader, xarray workflows, and analysis tools stay the
+same - Rasteret handles the async tile I/O underneath.
 
 ---
 
diff --git a/docs/index.md b/docs/index.md
@@ -17,11 +17,13 @@
 !!! success "What Rasteret does"
 
     Parse headers **once**, cache in Parquet, read pixels concurrently
-    with no GDAL in the path.
+    with no GDAL in the path. Because the index is Parquet, it's also
+    the table you work with - filter, join, enrich, and query with
+    standard tools before you ever fetch a pixel.
 
     ```text
-    STAC API / GeoParquet  -->  Parquet Index  -->  Tile-level byte reads
-           (once)                 (queryable)          (no GDAL, no headers)
+    STAC API / GeoParquet  -->  Collection (Parquet)  -->  Tile-level byte reads
+           (once)              (queryable, enrichable)       (no GDAL, no headers)
     ```
 
 ---
@@ -56,6 +58,21 @@
     Same Parquet index = same records = same results.
     Share a few MB file and collaborators skip re-indexing.
 
+-   :material-table-edit:{ .lg .middle } **Your dataset is a table**
+
+    ---
+
+    Filter, join, enrich with DuckDB or PyArrow. Add splits,
+    labels, and quality flags as columns. The index is the dataset.
+
+-   :material-swap-horizontal:{ .lg .middle } **Any Parquet with COG URLs**
+
+    ---
+
+    `build_from_table()` turns existing GeoParquet into a
+    Collection. Source Cooperative exports, STAC GeoParquet,
+    custom catalogs - if it has URLs, Rasteret can read it.
+
 </div>
 
 ---