Skip to content

Commit 1a7ed44

Browse files
committed
chore: replace non-ASCII symbols with plain ASCII equivalents
Replace em dashes, Unicode arrows, multiplication signs, and degree symbols across docs and source with contextually appropriate ASCII: colons for definition lists, commas/periods for clause breaks, -> for arrows, x for multiplication, deg for degrees. Box-drawing chars in display.py and docs example output are intentional and kept as-is.
1 parent cf9c7d5 commit 1a7ed44

File tree

17 files changed

+91
-79
lines changed

17 files changed

+91
-79
lines changed

docs/explanation/architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ obstore (auto-routes to S3Store / AzureStore / GCSStore / HTTPStore)
3939
to declare:
4040

4141
- how to find the data (STAC API / static catalog / GeoParquet URI)
42-
- how to resolve bands (band code → STAC asset key, optional `band_index_map`)
42+
- how to resolve bands (band code -> asset key, optional `band_index_map`)
4343
- how to access it (auth/requester-pays, URL signing/rewrites)
4444

4545
At runtime, Rasteret stores band-resolution and cloud-access settings in two

docs/explanation/schema-contract.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ byte-range tile reads, rather than storing raster payloads in Parquet.
5454

5555
Rasteret uses `DatasetDescriptor` objects to describe how a dataset is discovered
5656
(STAC API vs GeoParquet), accessed (cloud auth / URL rewriting), and mapped
57-
(band codes → STAC assets + optional `band_index` for multi-sample GeoTIFFs).
57+
(band codes -> asset keys + optional `band_index` for multi-sample GeoTIFFs).
5858

5959
See:
6060

docs/getting-started/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ python -c "import rasteret; print(rasteret.version())"
4040

4141
=== "Jupyter / JupyterLab"
4242

43-
Jupyter runs each notebook in a **kernel** a separate Python process.
43+
Jupyter runs each notebook in a **kernel**, a separate Python process.
4444
To use your Rasteret environment as a kernel:
4545

4646
```bash
@@ -52,7 +52,7 @@ python -c "import rasteret; print(rasteret.version())"
5252

5353
=== "marimo"
5454

55-
[marimo](https://marimo.io) manages dependencies inline no kernel
55+
[marimo](https://marimo.io) manages dependencies inline, no kernel
5656
registration needed. Just `uv pip install marimo` alongside Rasteret
5757
and run `marimo edit notebook.py`.
5858

docs/reference/integrations/torchgeo.md

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,12 @@
22

33
TorchGeo `GeoDataset` adapter for Rasteret collections.
44

5-
`RasteretGeoDataset` wraps a Rasteret `Collection` as a standard TorchGeo
6-
`GeoDataset`. It fetches COG tiles on-the-fly via async HTTP range reads and
7-
returns samples as `{"image": Tensor, "bounds": Tensor, "transform": Tensor}`.
8-
Compatible with all
9-
TorchGeo samplers (`RandomGeoSampler`, `GridGeoSampler`, etc.), collation
10-
helpers, and transforms.
11-
12-
This adapter provides **pipeline-level interop** (a TorchGeo dataset object).
13-
It does not replace TorchGeo's rasterio/GDAL-backed `RasterDataset` backend.
5+
`RasteretGeoDataset` is a standard TorchGeo `GeoDataset` subclass. It
6+
replaces the I/O backend (async obstore instead of rasterio/GDAL) while
7+
honoring the full GeoDataset contract: `index`, `crs`, `res`,
8+
`__getitem__(GeoSlice) -> Sample`. Compatible with all TorchGeo samplers,
9+
collation helpers (`stack_samples`, `concat_samples`), transforms, and
10+
dataset composition (`IntersectionDataset`, `UnionDataset`).
1411

1512
## Typical usage
1613

@@ -24,13 +21,28 @@ sampler = RandomGeoSampler(dataset, size=256, length=100)
2421
loader = DataLoader(dataset, sampler=sampler, collate_fn=stack_samples)
2522
```
2623

27-
## Output contract
24+
## GeoDataset contract (what TorchGeo requires)
25+
26+
Rasteret honors all of these:
27+
28+
- **`__getitem__(GeoSlice) -> Sample`**: returns a `dict[str, Any]`
29+
- **`index`**: GeoPandas GeoDataFrame with `IntervalIndex` named `"datetime"` and Shapely footprint geometry
30+
- **`crs`**: set from the collection's EPSG code
31+
- **`res`**: derived from the first record's COG metadata transform
32+
- **Dataset composition**: `IntersectionDataset(rasteret_ds, other_ds)` and `UnionDataset` work correctly
33+
34+
## Sample dict keys
35+
36+
**Standard keys** (always present):
37+
38+
- `bounds`: `Tensor` of spatial bounds
39+
- `transform`: `Tensor` of affine transform coefficients
40+
- `image`: `Tensor` with shape `[C, H, W]` (or `[T, C, H, W]` when `time_series=True`), when `is_image=True`
41+
- `mask`: `Tensor` with shape `[H, W]` (or `[T, H, W]`), when `is_image=False` (channel dim squeezed when `C == 1`, matching TorchGeo `RasterDataset` conventions)
42+
43+
**Rasteret additions** (optional, do not break interop):
2844

29-
- Keys always include `bounds` and `transform`.
30-
- If `is_image=True` (default), samples include `image: Tensor` with shape `[C, H, W]` (or `[T, C, H, W]` when `time_series=True`).
31-
- If `is_image=False`, samples include `mask: Tensor` and follow TorchGeo `RasterDataset` conventions:
32-
- Single-scene: `[H, W]` when `C == 1` (channel dimension squeezed).
33-
- Time series: `[T, H, W]` when `C == 1`.
45+
- `label`: scalar or tensor label from a metadata column, when `label_field` is set. TorchGeo's collate functions handle arbitrary keys, so this passes through `stack_samples` and `concat_samples` without issue.
3446

3547
Rasteret's low-level read APIs return a `valid_mask` for ML-safe workflows, but it
3648
is intentionally **not** included in TorchGeo samples by default to preserve

src/rasteret/constants.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,12 +42,12 @@
4242

4343

4444
# ---------------------------------------------------------------------------
45-
# BandRegistry: extensible mapping of collection id band names
45+
# BandRegistry: extensible mapping of collection id -> band names
4646
# ---------------------------------------------------------------------------
4747

4848

4949
class BandRegistry:
50-
"""Registry of collection band-name mappings.
50+
"""Registry of collection -> band-name mappings.
5151
5252
Built-in collections (Sentinel-2, Landsat) are pre-registered.
5353
Users can register custom collections at any time::

src/rasteret/core/collection.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def _stem_from_path(path_str: str) -> str:
6666
return Path(tail).stem if tail else ""
6767

6868

69-
# WKB geometry type id GeoParquet type name (OGC Simple Features).
69+
# WKB geometry type id -> GeoParquet type name (OGC Simple Features).
7070
_WKB_TYPE_NAMES: dict[int, str] = {
7171
1: "Point",
7272
2: "LineString",

src/rasteret/core/raster_accessor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ async def _load_single_band(
225225
# TODO: Apply radiometric correction (scale/offset from STAC
226226
# raster:bands) when opted in. See _get_band_radiometric_params().
227227
# Needs: opt-in flag (apply_scale_offset=False default),
228-
# nodata masking, and dtype promotion (uint16 float32).
228+
# nodata masking, and dtype promotion (uint16 -> float32).
229229

230230
return {"data": result.data, "transform": result.transform, "band": band_code}
231231

src/rasteret/core/utils.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -226,7 +226,7 @@ def reproject_array(
226226
from rasterio.warp import Resampling, reproject
227227

228228
# Always use a float dtype so NaN fill works correctly.
229-
# Integer dtypes (uint16, int8, etc.) silently cast NaN 0.
229+
# Integer dtypes (uint16, int8, etc.) silently cast NaN -> 0.
230230
out_dtype = src_array.dtype
231231
if not np.issubdtype(out_dtype, np.floating):
232232
out_dtype = np.float32
@@ -253,7 +253,7 @@ def compute_dst_grid(
253253
The caller must supply *res* in the **destination CRS units**. When
254254
the source and destination CRS share the same linear unit (e.g. both
255255
UTM metres) the source resolution can be passed directly. For
256-
cross-unit reprojection (e.g. UTM metres EPSG:4326 degrees) use
256+
cross-unit reprojection (e.g. UTM metres -> EPSG:4326 degrees) use
257257
:func:`compute_dst_grid_from_src` instead, which delegates to
258258
``rasterio.warp.calculate_default_transform``.
259259
@@ -286,7 +286,7 @@ def compute_dst_grid_from_src(
286286
287287
Wraps ``rasterio.warp.calculate_default_transform`` which delegates to
288288
GDAL's ``GDALSuggestedWarpOutput2``. This correctly handles cross-unit
289-
CRS conversions (e.g. UTM metres EPSG:4326 degrees) by sampling the
289+
CRS conversions (e.g. UTM metres -> EPSG:4326 degrees) by sampling the
290290
source grid, transforming points, and computing an optimal destination
291291
pixel size that preserves spatial-information density.
292292

src/rasteret/fetch/cog.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -190,11 +190,11 @@ class _AutoObstoreBackend:
190190
191191
Routes URLs to the appropriate native store:
192192
193-
- ``s3://`` and ``*.s3.*.amazonaws.com`` ``S3Store``
194-
- ``gs://`` and ``storage.googleapis.com`` ``GCSStore``
195-
- ``*.blob.core.windows.net`` ``AzureStore``
196-
- Pre-signed / SAS-signed URLs (query params) ``HTTPStore``
197-
- Other HTTPS ``HTTPStore``
193+
- ``s3://`` and ``*.s3.*.amazonaws.com`` -> ``S3Store``
194+
- ``gs://`` and ``storage.googleapis.com`` -> ``GCSStore``
195+
- ``*.blob.core.windows.net`` -> ``AzureStore``
196+
- Pre-signed / SAS-signed URLs (query params) -> ``HTTPStore``
197+
- Other HTTPS -> ``HTTPStore``
198198
199199
Each store holds a Rust ``reqwest`` connection pool, so
200200
one-per-origin is the correct granularity.
@@ -249,13 +249,13 @@ def _store_for(self, url: str) -> tuple[object, str]:
249249
break
250250
parsed = urlparse(url)
251251

252-
# --- s3:// scheme S3Store ---
252+
# --- s3:// scheme -> S3Store ---
253253
if parsed.scheme == "s3":
254254
bucket = parsed.netloc
255255
path = parsed.path.lstrip("/")
256256
return self._get_s3_store(bucket), path
257257

258-
# --- gs:// scheme GCSStore ---
258+
# --- gs:// scheme -> GCSStore ---
259259
if parsed.scheme == "gs":
260260
bucket = parsed.netloc
261261
return self._get_gcs_store(bucket), parsed.path.lstrip("/")
@@ -274,12 +274,12 @@ def _store_for(self, url: str) -> tuple[object, str]:
274274
self._stores[url] = store
275275
return store, ""
276276

277-
# --- S3 virtual-hosted HTTPS S3Store ---
277+
# --- S3 virtual-hosted HTTPS -> S3Store ---
278278
bucket = _extract_s3_bucket(parsed.netloc)
279279
if bucket:
280280
return self._get_s3_store(bucket), parsed.path.lstrip("/")
281281

282-
# --- Azure Blob HTTPS AzureStore ---
282+
# --- Azure Blob HTTPS -> AzureStore ---
283283
azure_account = _extract_azure_account(parsed.netloc)
284284
if azure_account:
285285
parts = parsed.path.lstrip("/").split("/", 1)
@@ -295,7 +295,7 @@ def _store_for(self, url: str) -> tuple[object, str]:
295295
path = path[len(store_prefix) :].lstrip("/")
296296
return store, path
297297

298-
# --- GCS HTTPS GCSStore ---
298+
# --- GCS HTTPS -> GCSStore ---
299299
if parsed.netloc == "storage.googleapis.com":
300300
parts = parsed.path.lstrip("/").split("/", 1)
301301
bucket = parts[0]

src/rasteret/ingest/parquet_record_table.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939

4040

4141
def _rewrite_url_simple(url: str, patterns: dict[str, str]) -> str:
42-
"""Apply URL rewrite patterns (e.g. S3 HTTPS)."""
42+
"""Apply URL rewrite patterns (e.g. S3 -> HTTPS)."""
4343
for src_prefix, dst_prefix in patterns.items():
4444
if url.startswith(src_prefix):
4545
return url.replace(src_prefix, dst_prefix, 1)
@@ -61,23 +61,23 @@ def prepare_record_table(
6161
6262
Steps:
6363
64-
1. Auto-coerce ``id``: integer string.
65-
2. Auto-coerce ``datetime``: integer year timestamp.
64+
1. Auto-coerce ``id``: integer -> string.
65+
2. Auto-coerce ``datetime``: integer year -> timestamp.
6666
3. Construct ``assets`` from *href_column* + *band_index_map*.
6767
4. Derive ``proj:epsg`` from a ``crs`` column when present.
6868
"""
6969
names = set(table.schema.names)
7070
rewrites = url_rewrite_patterns or {}
7171

72-
# --- id: int string ---
72+
# --- id: int -> string ---
7373
if "id" in names and pa.types.is_integer(table.schema.field("id").type):
7474
table = table.set_column(
7575
table.schema.get_field_index("id"),
7676
"id",
7777
pc.cast(table.column("id"), pa.string()),
7878
)
7979

80-
# --- datetime: int year timestamp ---
80+
# --- datetime: int year -> timestamp ---
8181
if "datetime" in names and pa.types.is_integer(table.schema.field("datetime").type):
8282
years = table.column("datetime").to_pylist()
8383
timestamps = pa.array(
@@ -178,7 +178,7 @@ class RecordTableBuilder(CollectionBuilder):
178178
``href_column`` to build per-band asset references.
179179
url_rewrite_patterns : dict, optional
180180
``{source_prefix: target_prefix}`` patterns applied to URLs
181-
during assets construction (e.g. S3 HTTPS rewriting).
181+
during assets construction (e.g. S3 -> HTTPS rewriting).
182182
filesystem : pyarrow.fs.FileSystem, optional
183183
PyArrow filesystem for reading remote URIs (e.g.
184184
``S3FileSystem(anonymous=True)``).
@@ -273,7 +273,7 @@ def _prepare_table(self, table: pa.Table) -> pa.Table:
273273
def build(self, **kwargs: Any) -> "Collection":
274274
"""Read the record table and return a normalized Collection.
275275
276-
Pipeline: read alias prepare enrich normalize.
276+
Pipeline: read -> alias -> prepare -> enrich -> normalize.
277277
278278
Parameters
279279
----------

0 commit comments

Comments
 (0)