chore: replace non-ASCII symbols with plain ASCII equivalents

print-sid8 · print-sid8 · commit 1a7ed44ca1f4 · 2026-02-26T07:09:20.000Z
Replace em dashes, Unicode arrows, multiplication signs, and degree
symbols across docs and source with contextually appropriate ASCII:
colons for definition lists, commas/periods for clause breaks, -&gt; for
arrows, x for multiplication, deg for degrees. Box-drawing chars in
display.py and docs example output are intentional and kept as-is.
diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md
@@ -39,7 +39,7 @@ obstore (auto-routes to S3Store / AzureStore / GCSStore / HTTPStore)
 to declare:
 
 - how to find the data (STAC API / static catalog / GeoParquet URI)
-- how to resolve bands (band code → STAC asset key, optional `band_index_map`)
+- how to resolve bands (band code -> asset key, optional `band_index_map`)
 - how to access it (auth/requester-pays, URL signing/rewrites)
 
 At runtime, Rasteret stores band-resolution and cloud-access settings in two
diff --git a/docs/explanation/schema-contract.md b/docs/explanation/schema-contract.md
@@ -54,7 +54,7 @@ byte-range tile reads, rather than storing raster payloads in Parquet.
 
 Rasteret uses `DatasetDescriptor` objects to describe how a dataset is discovered
 (STAC API vs GeoParquet), accessed (cloud auth / URL rewriting), and mapped
-(band codes → STAC assets + optional `band_index` for multi-sample GeoTIFFs).
+(band codes -> asset keys + optional `band_index` for multi-sample GeoTIFFs).
 
 See:
 
diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md
@@ -40,7 +40,7 @@ python -c "import rasteret; print(rasteret.version())"
 
 === "Jupyter / JupyterLab"
 
-    Jupyter runs each notebook in a **kernel** — a separate Python process.
+    Jupyter runs each notebook in a **kernel**, a separate Python process.
     To use your Rasteret environment as a kernel:
 
     ```bash
@@ -52,7 +52,7 @@ python -c "import rasteret; print(rasteret.version())"
 
 === "marimo"
 
-    [marimo](https://marimo.io) manages dependencies inline — no kernel
+    [marimo](https://marimo.io) manages dependencies inline, no kernel
     registration needed.  Just `uv pip install marimo` alongside Rasteret
     and run `marimo edit notebook.py`.
 
diff --git a/docs/reference/integrations/torchgeo.md b/docs/reference/integrations/torchgeo.md
@@ -2,15 +2,12 @@
 
 TorchGeo `GeoDataset` adapter for Rasteret collections.
 
-`RasteretGeoDataset` wraps a Rasteret `Collection` as a standard TorchGeo
-`GeoDataset`. It fetches COG tiles on-the-fly via async HTTP range reads and
-returns samples as `{"image": Tensor, "bounds": Tensor, "transform": Tensor}`.
-Compatible with all
-TorchGeo samplers (`RandomGeoSampler`, `GridGeoSampler`, etc.), collation
-helpers, and transforms.
-
-This adapter provides **pipeline-level interop** (a TorchGeo dataset object).
-It does not replace TorchGeo's rasterio/GDAL-backed `RasterDataset` backend.
+`RasteretGeoDataset` is a standard TorchGeo `GeoDataset` subclass. It
+replaces the I/O backend (async obstore instead of rasterio/GDAL) while
+honoring the full GeoDataset contract: `index`, `crs`, `res`,
+`__getitem__(GeoSlice) -> Sample`. Compatible with all TorchGeo samplers,
+collation helpers (`stack_samples`, `concat_samples`), transforms, and
+dataset composition (`IntersectionDataset`, `UnionDataset`).
 
 ## Typical usage
 
@@ -24,13 +21,28 @@ sampler = RandomGeoSampler(dataset, size=256, length=100)
 loader = DataLoader(dataset, sampler=sampler, collate_fn=stack_samples)
 ```
 
-## Output contract
+## GeoDataset contract (what TorchGeo requires)
+
+Rasteret honors all of these:
+
+- **`__getitem__(GeoSlice) -> Sample`**: returns a `dict[str, Any]`
+- **`index`**: GeoPandas GeoDataFrame with `IntervalIndex` named `"datetime"` and Shapely footprint geometry
+- **`crs`**: set from the collection's EPSG code
+- **`res`**: derived from the first record's COG metadata transform
+- **Dataset composition**: `IntersectionDataset(rasteret_ds, other_ds)` and `UnionDataset` work correctly
+
+## Sample dict keys
+
+**Standard keys** (always present):
+
+- `bounds`: `Tensor` of spatial bounds
+- `transform`: `Tensor` of affine transform coefficients
+- `image`: `Tensor` with shape `[C, H, W]` (or `[T, C, H, W]` when `time_series=True`), when `is_image=True`
+- `mask`: `Tensor` with shape `[H, W]` (or `[T, H, W]`), when `is_image=False` (channel dim squeezed when `C == 1`, matching TorchGeo `RasterDataset` conventions)
+
+**Rasteret additions** (optional, do not break interop):
 
-- Keys always include `bounds` and `transform`.
-- If `is_image=True` (default), samples include `image: Tensor` with shape `[C, H, W]` (or `[T, C, H, W]` when `time_series=True`).
-- If `is_image=False`, samples include `mask: Tensor` and follow TorchGeo `RasterDataset` conventions:
-  - Single-scene: `[H, W]` when `C == 1` (channel dimension squeezed).
-  - Time series: `[T, H, W]` when `C == 1`.
+- `label`: scalar or tensor label from a metadata column, when `label_field` is set. TorchGeo's collate functions handle arbitrary keys, so this passes through `stack_samples` and `concat_samples` without issue.
 
 Rasteret's low-level read APIs return a `valid_mask` for ML-safe workflows, but it
 is intentionally **not** included in TorchGeo samples by default to preserve
diff --git a/src/rasteret/constants.py b/src/rasteret/constants.py
@@ -42,12 +42,12 @@
 
 
 # ---------------------------------------------------------------------------
-# BandRegistry: extensible mapping of collection id → band names
+# BandRegistry: extensible mapping of collection id -> band names
 # ---------------------------------------------------------------------------
 
 
 class BandRegistry:
-    """Registry of collection → band-name mappings.
+    """Registry of collection -> band-name mappings.
 
     Built-in collections (Sentinel-2, Landsat) are pre-registered.
     Users can register custom collections at any time::
diff --git a/src/rasteret/core/collection.py b/src/rasteret/core/collection.py
@@ -66,7 +66,7 @@ def _stem_from_path(path_str: str) -> str:
     return Path(tail).stem if tail else ""
 
 
-# WKB geometry type id → GeoParquet type name (OGC Simple Features).
+# WKB geometry type id -> GeoParquet type name (OGC Simple Features).
 _WKB_TYPE_NAMES: dict[int, str] = {
     1: "Point",
     2: "LineString",
diff --git a/src/rasteret/core/raster_accessor.py b/src/rasteret/core/raster_accessor.py
@@ -225,7 +225,7 @@ async def _load_single_band(
         # TODO: Apply radiometric correction (scale/offset from STAC
         # raster:bands) when opted in.  See _get_band_radiometric_params().
         # Needs: opt-in flag (apply_scale_offset=False default),
-        # nodata masking, and dtype promotion (uint16 → float32).
+        # nodata masking, and dtype promotion (uint16 -> float32).
 
         return {"data": result.data, "transform": result.transform, "band": band_code}
 
diff --git a/src/rasteret/core/utils.py b/src/rasteret/core/utils.py
@@ -226,7 +226,7 @@ def reproject_array(
     from rasterio.warp import Resampling, reproject
 
     # Always use a float dtype so NaN fill works correctly.
-    # Integer dtypes (uint16, int8, etc.) silently cast NaN → 0.
+    # Integer dtypes (uint16, int8, etc.) silently cast NaN -> 0.
     out_dtype = src_array.dtype
     if not np.issubdtype(out_dtype, np.floating):
         out_dtype = np.float32
@@ -253,7 +253,7 @@ def compute_dst_grid(
     The caller must supply *res* in the **destination CRS units**.  When
     the source and destination CRS share the same linear unit (e.g. both
     UTM metres) the source resolution can be passed directly.  For
-    cross-unit reprojection (e.g. UTM metres → EPSG:4326 degrees) use
+    cross-unit reprojection (e.g. UTM metres -> EPSG:4326 degrees) use
     :func:`compute_dst_grid_from_src` instead, which delegates to
     ``rasterio.warp.calculate_default_transform``.
 
@@ -286,7 +286,7 @@ def compute_dst_grid_from_src(
 
     Wraps ``rasterio.warp.calculate_default_transform`` which delegates to
     GDAL's ``GDALSuggestedWarpOutput2``.  This correctly handles cross-unit
-    CRS conversions (e.g. UTM metres → EPSG:4326 degrees) by sampling the
+    CRS conversions (e.g. UTM metres -> EPSG:4326 degrees) by sampling the
     source grid, transforming points, and computing an optimal destination
     pixel size that preserves spatial-information density.
 
diff --git a/src/rasteret/fetch/cog.py b/src/rasteret/fetch/cog.py
@@ -190,11 +190,11 @@ class _AutoObstoreBackend:
 
     Routes URLs to the appropriate native store:
 
-    - ``s3://`` and ``*.s3.*.amazonaws.com`` → ``S3Store``
-    - ``gs://`` and ``storage.googleapis.com`` → ``GCSStore``
-    - ``*.blob.core.windows.net`` → ``AzureStore``
-    - Pre-signed / SAS-signed URLs (query params) → ``HTTPStore``
-    - Other HTTPS → ``HTTPStore``
+    - ``s3://`` and ``*.s3.*.amazonaws.com`` -> ``S3Store``
+    - ``gs://`` and ``storage.googleapis.com`` -> ``GCSStore``
+    - ``*.blob.core.windows.net`` -> ``AzureStore``
+    - Pre-signed / SAS-signed URLs (query params) -> ``HTTPStore``
+    - Other HTTPS -> ``HTTPStore``
 
     Each store holds a Rust ``reqwest`` connection pool, so
     one-per-origin is the correct granularity.
@@ -249,13 +249,13 @@ def _store_for(self, url: str) -> tuple[object, str]:
                 break
         parsed = urlparse(url)
 
-        # --- s3:// scheme → S3Store ---
+        # --- s3:// scheme -> S3Store ---
         if parsed.scheme == "s3":
             bucket = parsed.netloc
             path = parsed.path.lstrip("/")
             return self._get_s3_store(bucket), path
 
-        # --- gs:// scheme → GCSStore ---
+        # --- gs:// scheme -> GCSStore ---
         if parsed.scheme == "gs":
             bucket = parsed.netloc
             return self._get_gcs_store(bucket), parsed.path.lstrip("/")
@@ -274,12 +274,12 @@ def _store_for(self, url: str) -> tuple[object, str]:
                 self._stores[url] = store
             return store, ""
 
-        # --- S3 virtual-hosted HTTPS → S3Store ---
+        # --- S3 virtual-hosted HTTPS -> S3Store ---
         bucket = _extract_s3_bucket(parsed.netloc)
         if bucket:
             return self._get_s3_store(bucket), parsed.path.lstrip("/")
 
-        # --- Azure Blob HTTPS → AzureStore ---
+        # --- Azure Blob HTTPS -> AzureStore ---
         azure_account = _extract_azure_account(parsed.netloc)
         if azure_account:
             parts = parsed.path.lstrip("/").split("/", 1)
@@ -295,7 +295,7 @@ def _store_for(self, url: str) -> tuple[object, str]:
                 path = path[len(store_prefix) :].lstrip("/")
             return store, path
 
-        # --- GCS HTTPS → GCSStore ---
+        # --- GCS HTTPS -> GCSStore ---
         if parsed.netloc == "storage.googleapis.com":
             parts = parsed.path.lstrip("/").split("/", 1)
             bucket = parts[0]
diff --git a/src/rasteret/ingest/parquet_record_table.py b/src/rasteret/ingest/parquet_record_table.py
@@ -39,7 +39,7 @@
 
 
 def _rewrite_url_simple(url: str, patterns: dict[str, str]) -> str:
-    """Apply URL rewrite patterns (e.g. S3 → HTTPS)."""
+    """Apply URL rewrite patterns (e.g. S3 -> HTTPS)."""
     for src_prefix, dst_prefix in patterns.items():
         if url.startswith(src_prefix):
             return url.replace(src_prefix, dst_prefix, 1)
@@ -61,23 +61,23 @@ def prepare_record_table(
 
     Steps:
 
-    1. Auto-coerce ``id``: integer → string.
-    2. Auto-coerce ``datetime``: integer year → timestamp.
+    1. Auto-coerce ``id``: integer -> string.
+    2. Auto-coerce ``datetime``: integer year -> timestamp.
     3. Construct ``assets`` from *href_column* + *band_index_map*.
     4. Derive ``proj:epsg`` from a ``crs`` column when present.
     """
     names = set(table.schema.names)
     rewrites = url_rewrite_patterns or {}
 
-    # --- id: int → string ---
+    # --- id: int -> string ---
     if "id" in names and pa.types.is_integer(table.schema.field("id").type):
         table = table.set_column(
             table.schema.get_field_index("id"),
             "id",
             pc.cast(table.column("id"), pa.string()),
         )
 
-    # --- datetime: int year → timestamp ---
+    # --- datetime: int year -> timestamp ---
     if "datetime" in names and pa.types.is_integer(table.schema.field("datetime").type):
         years = table.column("datetime").to_pylist()
         timestamps = pa.array(
@@ -178,7 +178,7 @@ class RecordTableBuilder(CollectionBuilder):
         ``href_column`` to build per-band asset references.
     url_rewrite_patterns : dict, optional
         ``{source_prefix: target_prefix}`` patterns applied to URLs
-        during assets construction (e.g. S3 → HTTPS rewriting).
+        during assets construction (e.g. S3 -> HTTPS rewriting).
     filesystem : pyarrow.fs.FileSystem, optional
         PyArrow filesystem for reading remote URIs (e.g.
         ``S3FileSystem(anonymous=True)``).
@@ -273,7 +273,7 @@ def _prepare_table(self, table: pa.Table) -> pa.Table:
     def build(self, **kwargs: Any) -> "Collection":
         """Read the record table and return a normalized Collection.
 
-        Pipeline: read → alias → prepare → enrich → normalize.
+        Pipeline: read -> alias -> prepare -> enrich -> normalize.
 
         Parameters
         ----------
diff --git a/src/rasteret/integrations/torchgeo.py b/src/rasteret/integrations/torchgeo.py
@@ -831,7 +831,7 @@ def _auto_resampling_for_dtype(dtype: np.dtype) -> str:
                 # Sort chronologically so T dimension is time-ordered.
                 df = df.sort_index()
 
-                # Build all T×C requests, fire concurrently via asyncio.gather.
+                # Build all TxC requests, fire concurrently via asyncio.gather.
                 all_requests: list[tuple[str, CogMetadata, int | None]] = []
                 n_timesteps = len(df)
                 source_crs_per_request: list[int] = []
@@ -908,9 +908,9 @@ def _auto_resampling_for_dtype(dtype: np.dtype) -> str:
 
                 # arrays is flat: [t0_band0, t0_band1, ..., t1_band0, ...]
                 # Reshape into [T, C, H, W] with minimal copies:
-                #   np.stack at band level  → [C, H, W] (1 contiguous alloc per timestep)
-                #   np.stack at time level  → [T, C, H, W] (1 contiguous alloc)
-                #   torch.from_numpy        → zero-copy view
+                #   np.stack at band level  -> [C, H, W] (1 contiguous alloc per timestep)
+                #   np.stack at time level  -> [T, C, H, W] (1 contiguous alloc)
+                #   torch.from_numpy        -> zero-copy view
                 timesteps = [
                     np.stack(arrays[t_idx * n_bands : (t_idx + 1) * n_bands], axis=0)
                     for t_idx in range(n_timesteps)
@@ -981,7 +981,7 @@ def _auto_resampling_for_dtype(dtype: np.dtype) -> str:
                         for (r, meta) in fetch_results
                     ]
 
-                # np.stack → [C, H, W] (1 alloc), torch.from_numpy → zero-copy.
+                # np.stack -> [C, H, W] (1 alloc), torch.from_numpy -> zero-copy.
                 image = torch.from_numpy(np.stack(arrays, axis=0))  # [C, H, W]
 
             transform = torch.tensor(
@@ -1004,7 +1004,7 @@ def _auto_resampling_for_dtype(dtype: np.dtype) -> str:
                     sample["mask"] = image.squeeze(0)
             if self.label_field is not None:
                 # Use the first (earliest) record's label.  For time_series
-                # this means the label is NOT per-timestep — it represents
+                # this means the label is NOT per-timestep - it represents
                 # the scene-level label of the earliest observation.
                 label_value = None
                 if not df.empty:
diff --git a/src/rasteret/tests/test_cog_reader.py b/src/rasteret/tests/test_cog_reader.py
@@ -334,7 +334,7 @@ def _make_data_and_transform(
         return data, transform
 
     def test_normal_mask(self):
-        """Triangle inside raster → cropped array with fill outside polygon."""
+        """Triangle inside raster -> cropped array with fill outside polygon."""
         data, transform = self._make_data_and_transform()
         # Triangle inside the raster; its bounding box will include corners
         # that are outside the triangle, producing NaN pixels.
@@ -364,7 +364,7 @@ def test_normal_mask(self):
         assert result_transform.f <= transform.f
 
     def test_empty_mask_returns_nan(self):
-        """Geometry does not intersect any pixels → fill-valued array."""
+        """Geometry does not intersect any pixels -> fill-valued array."""
         data, transform = self._make_data_and_transform()
         # Geometry far outside the raster extent
         geojson = {
@@ -382,7 +382,7 @@ def test_empty_mask_returns_nan(self):
         assert result_transform == transform
 
     def test_full_coverage(self):
-        """Geometry covers the entire raster → all pixels valid, no NaN."""
+        """Geometry covers the entire raster -> all pixels valid, no NaN."""
         data, transform = self._make_data_and_transform(rows=10, cols=10)
         # Box that fully covers the 10x10 raster
         geojson = {
@@ -434,14 +434,14 @@ def test_multiple_tile_intersection(self):
         """Geometry bbox spans across multiple tiles."""
         bbox = (500000.0, 996000.0, 504000.0, 1000000.0)
         transform = [10.0, 500000.0, -10.0, 1000000.0]
-        # 1000x1000 image with 256x256 tiles → 4x4 tile grid
+        # 1000x1000 image with 256x256 tiles -> 4x4 tile grid
         indices = compute_tile_indices(
             bbox, transform, tile_size=(256, 256), image_size=(1000, 1000)
         )
         assert len(indices) > 1
 
     def test_no_intersection(self):
-        """Geometry bbox completely outside raster extent → empty list."""
+        """Geometry bbox completely outside raster extent -> empty list."""
         bbox = (0.0, 0.0, 1.0, 1.0)
         transform = [10.0, 500000.0, -10.0, 1000000.0]
         indices = compute_tile_indices(
diff --git a/src/rasteret/tests/test_dataset_pixel_comparison.py b/src/rasteret/tests/test_dataset_pixel_comparison.py
@@ -265,7 +265,7 @@ def _compare_arrays(
     band: str,
     dataset_id: str,
 ) -> None:
-    """Compare two arrays — exact shape and pixel match required.
+    """Compare two arrays: exact shape and pixel match required.
 
     For integer dtypes: exact equality.
     For float dtypes: np.allclose with atol=0, equal_nan=True.
diff --git a/src/rasteret/tests/test_execution.py b/src/rasteret/tests/test_execution.py
@@ -165,7 +165,7 @@ def test_single_crs_returns_none(self):
             assert _detect_target_crs(c, {}) is None
 
     def test_multi_crs_returns_most_common(self):
-        # 3× EPSG:32632, 1× EPSG:32633 → should pick 32632
+        # 3x EPSG:32632, 1x EPSG:32633 -> should pick 32632
         table, _ = self._make_collection([32632, 32632, 32632, 32633])
         with TemporaryDirectory() as tmp:
             path = Path(tmp) / "multi_crs"
@@ -176,7 +176,7 @@ def test_multi_crs_returns_most_common(self):
             assert result == 32632
 
     def test_multi_crs_equal_counts_picks_one(self):
-        # 2× each → should pick one deterministically
+        # 2x each -> should pick one deterministically
         table, _ = self._make_collection([32632, 32632, 32633, 32633])
         with TemporaryDirectory() as tmp:
             path = Path(tmp) / "equal_crs"
diff --git a/src/rasteret/tests/test_obstore_routing.py b/src/rasteret/tests/test_obstore_routing.py
@@ -332,7 +332,7 @@ def test_azure_blob_routes_to_azure_store(self):
         assert "azure://landsateuwest/landsat-c2" in backend._stores
 
     def test_azure_sas_routes_to_http_store(self):
-        """SAS-signed URLs have query params → HTTPStore (self-authenticating)."""
+        """SAS-signed URLs have query params -> HTTPStore (self-authenticating)."""
         backend = _create_obstore_backend()
         url = (
             "https://landsateuwest.blob.core.windows.net/landsat-c2/file.tif"
diff --git a/src/rasteret/tests/test_torchgeo_network.py b/src/rasteret/tests/test_torchgeo_network.py
diff --git a/src/rasteret/tests/test_utils.py b/src/rasteret/tests/test_utils.py