TerraFlow v0.2.0 is a reproducible, open-source geospatial workflow framework for agricultural modeling. It provides:
- Geospatial preprocessing (rasters, vectors, ROI clipping)
- Spatially-aware climate data (per-cell spatial interpolation with fallback strategies) - NEW in v0.2.0
- Config-driven model execution with Pydantic v2 validation
- Python package with CLI interface (
terraflow run) - Docker workflow support
- JOSS-compatible research workflow and manuscript
- Comprehensive test suite (33+ tests) with 100% pass rate
- Interactive Jupyter notebook for testing and visualization
- Architecture Decision Records (ADRs) for design documentation
Use TerraFlow to build, test, and publish reproducible agricultural analytics pipelines.
Core Capabilities:
- Modern Python package (pyproject.toml, PEP 621 compliant)
- Fully uv-installable (
uv pip install terraflow-agro) - Reproducible CLI interface (
terraflow run --config <file>) - Pydantic v2 configuration models with geographic coordinate validation - enhanced in v0.2.0
- Spatial interpolation using scipy.interpolate.griddata - new in v0.2.0
- Extensible workflow architecture with clean separation of concerns
Development & Testing:
- Comprehensive test suite with pytest (33+ tests across 10 test files)
- Linting with ruff and black
- Makefile automation for dev/test/build/release workflows
- Interactive Jupyter notebook for comprehensive testing
- Example data and demo configurations
CI/CD & Documentation:
- GitHub Actions for CI testing and linting
- Automated PyPI publishing on version tags
- MkDocs-based documentation with GitHub Pages deployment
- JOSS manuscript build automation
- Docker support for containerized workflows
Architecture & Design:
- Architecture Decision Records (ADRs) documenting key design choices
- Clean module separation (cli, config, climate, geo, ingest, model, pipeline, stats, viz)
- Comprehensive error handling and resource management
- Production-ready code quality
uv pip install terraflow-agroVerify installation:
import terraflow
print(terraflow.__version__)Clone the repo:
git clone https://github.com/gmarupilla/AgroTerraFlow.git
cd AgroTerraFlowmake devThis runs:
uv venv .venvuv pip install --python .venv/bin/python -e ".[dev]"(Using onlypyproject.toml— no requirements.txt)
make run-demo
which is equivalent to:
terraflow --config examples/demo_config.ymlAfter pip install terraflow-agro, TerraFlow exposes a terraflow command:
terraflow --config config.ymlRelative paths inside the config file resolve relative to the config file's own directory, so configs are portable regardless of your working directory.
Example:
terraflow --config examples/demo_config.ymlYour results will appear in:
outputs/
Each pipeline execution is identified by a deterministic run_fingerprint derived from:
- Canonicalized YAML configuration
- ROI geometry hash
- Input file fingerprints (sha256, size, mtime)
Identical inputs always produce the same fingerprint across machines. This enables immutable run directories like:
runs/<fingerprint>/...
TerraFlow now supports per-cell climate data with two interpolation strategies:
For climate data with geographic coordinates (weather stations, satellite grids):
climate:
strategy: spatial # Interpolate using scipy.griddata
fallback_to_mean: true # Use global mean for extrapolated cellsBenefits:
- Works with arbitrary observation locations
- Smooth spatial gradients across your ROI
- Graceful handling of sparse data
For pre-aligned climate data (one row per cell):
climate:
strategy: index # Direct row-to-cell matching
fallback_to_mean: true # Use mean for mismatched countsClimate CSV Format:
Your climate CSV must have lat, lon, and climate variables:
lat,lon,mean_temp,total_rain
34.05,-118.24,22.5,250.0
34.10,-118.19,23.1,260.0See Climate Configuration and ADR-003 for details.
Install the docs dependencies and serve the site:
uv pip install -r docs/requirements.txt
mkdocs serveDocumentation is built and published automatically via GitHub Pages on every push to main.
make devmake testmake run-demomake lintThis runs ruff and black for code formatting and style checks.
TerraFlow includes a comprehensive test suite with 33+ tests covering all core functionality.
make testThe test suite covers:
- CLI argument parsing and error handling
- Climate data loading and interpolation (spatial and index-based)
- Configuration validation with Pydantic v2
- Geospatial operations (ROI clipping, masking, band selection)
- Data ingestion and preprocessing
- Model execution
- Pipeline integration
- Statistical analysis
- Visualization generation
Use the comprehensive Jupyter notebook for interactive testing and exploration:
jupyter notebook notebooks/terraflow_v0.2.0_comprehensive_test.ipynbmake docker-buildmake docker-runEquivalent to:
docker run --rm \
-v $(pwd):/app \
terraflow:latest \
--config examples/demo_config.ymlThe main CI pipeline runs on every push and pull request to main/master:
- Sets up Python 3.10 and uv package manager
- Creates virtual environment and installs dependencies
- Runs full test suite with pytest
- Runs linting checks with ruff and black
Automatically builds and deploys documentation to GitHub Pages on every push to main:
- Builds MkDocs site with strict mode
- Deploys to GitHub Pages
Triggered on version tags (v*..):
- Builds Python wheel and source distribution
- Publishes to PyPI automatically
- No manual intervention required
Builds the JOSS paper PDF on version tags or manual trigger:
- Generates publication-ready manuscript
- Uploads as GitHub artifact
Publishing is fully automated via GitHub Actions and publish-pypi.yml.
make release version=0.1.XThis:
- updates
pyproject.toml - updates
terraflow/__init__.py - commits version bump
- tags release
- pushes tag → triggers PyPI publish
- wheel (
.whl) - source distribution (
.tar.gz)
No manual PyPI login required.
TerraFlow uses Pydantic v2 for typed config:
from pydantic import BaseModel
class WorkflowConfig(BaseModel):
input_raster: str
roi_path: str
climate_source: str
output_dir: str = "outputs"
model_config = {
"extra": "forbid",
"validate_default": True
}A typical YAML config:
input_raster: "examples/sample_data/soil.tif"
roi_path: "examples/sample_data/roi.geojson"
climate_source: "era5"
output_dir: "outputs"TerraFlow follows clean architecture principles with clear separation of concerns:
- cli.py: Command-line interface with argument parsing and error handling
- config.py: Pydantic v2 models for configuration validation
- climate.py: Climate data interpolation with spatial and index-based strategies
- geo.py: Geospatial operations (raster I/O, ROI clipping, coordinate validation)
- ingest.py: Data ingestion and preprocessing
- model.py: Core modeling logic
- pipeline.py: Workflow orchestration and execution
- stats.py: Statistical analysis and aggregation
- viz.py: Visualization generation with Plotly
- utils.py: Utility functions and helpers
Key design decisions are documented in ADRs:
- ADR-001: Band selection strategy for multi-band rasters
- ADR-002: Bounding box vs polygon ROI support
- ADR-003: Climate interpolation strategies (spatial vs index-based)
See docs/architecture/ for detailed ADRs.
See docs/ROADMAP.md for detailed feature planning.
Planned enhancements:
- Multiple crop models support
- Calibration and uncertainty quantification modules
- Enhanced geospatial visualization
- Improved CLI templates and pipeline configurability
- Performance optimization for large-scale rasters
- Additional interpolation methods
Contributions are welcome! See docs/contributing.md for guidelines.
If you use TerraFlow in your research, please cite our JOSS paper (manuscript in preparation).
MIT License — free for academic, commercial, and open-source use.