This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Geospatial AI Map Reader for UCSB DreamLab. Transforms scans of paper maps into structured, queryable data using machine vision and computational cartography. Built as interactive Marimo notebooks for exploring, auditing, and visualizing GeoTIFF collections.
Serves as new implementations of MapReader and MapKurator.
# Run the main interactive notebook
marimo edit MapReader.py
# Run the simple file explorer
python main.pyDependencies are auto-installed at runtime in MapReader.py via pip fallback. Core deps: numpy, rasterio, pillow, polars, leafmap, localtileserver, easyocr, opencv-python, plotly.
MapReader.py — Main Marimo notebook (reactive cell-based execution). Each @app.cell is an independent unit:
- Dependency management — Auto-installs missing packages, imports core libraries
- Random sampling & preview —
get_random_sample_tiffs()walksgeotiffs/dirs,downsample_tiff()reduces resolution for display (handles multi-band, dtype normalization to uint8) - Visual previews — Renders downsampled images from each category using
mo.hstack - Geospatial audit —
audit_geotiff_collection()extracts metadata (dims, bands, CRS, dtype) into a Polars DataFrame - Interactive mapping —
create_individual_maps_with_images()transforms CRS to WGS84 (EPSG:4326), overlays rasters on Leafmap basemaps via base64-encoded PNG - Plotly gallery — Subplot grid of samples with pan/zoom
- Graticule extraction — Canny edge detection + HoughLinesP + histogram peak detection to find grid lines (max 100 divisions)
- OCR text detection — EasyOCR for English and Simplified Chinese, classifies detected text blocks by type (character/number/symbol/mixed)
main.py — Lightweight Marimo script that lists GeoTIFF files by directory category.
GeoTIFF data lives in geotiffs/ (git-ignored). Three collections:
7900/— Spanish topographic maps (collarless), with.tif,.tfw,.prjfiles8450/— Mixed GeoTIFF assortmentru_cn_topos/— Russian language topographic maps of China, includes.ovrand.aux.xmlmetadata
dumbtiffs/ (also git-ignored) holds non-georeferenced TIFFs for comparison.
- Image processing pipeline: load via rasterio → downsample → extract RGB bands → normalize to uint8 → convert to PIL Image
- CRS handling: always check for missing CRS before geospatial operations; use
rasterio.warp.transform_boundsfor WGS84 conversion - Marimo cells return variables via tuple to make them available to other cells
- Error handling uses try-except with silent fallback for optional dependencies and per-file errors during batch operations
- Preprocessing (radiometric correction, COG tiling)
- Detection (deep learning text detection → binary masks)
- Subtraction (bitwise ops → "Text-Only" and "Features-Only" rasters)
- Reconstruction (neural inpainting for contour/road continuity)
- Vectorization (cleaned rasters → GeoJSON or H3 indexes)