Data quality framework for the eGon energy system data pipeline
SQL-first validation framework for PostgreSQL/PostGIS databases. Execute validation rules directly in the database, generate interactive reports, and integrate with Airflow workflows.
- SQL-First Execution - Push validation logic to the database
- PostGIS Support - Geometry and SRID validation
- Extensible Rules - Built-in + custom rules
- HTML Reports - Interactive reports with filtering
- Airflow Ready - Pipeline integration
- Parallel Processing - Multi-threaded execution
pip install -e .
export DB_URL="postgresql://user:password@host:port/database"
egon-validation run-task --run-id my-run --task validation-test
egon-validation final-report --run-id my-runOutput: validation_runs/my-run/final/report.html
See docs/ for full documentation:
egon_validation/
├── cli.py # Command-line interface
├── config.py # Configuration management
├── db.py # Database connections
├── rules/
│ ├── base.py # Base rule classes
│ ├── formal/ # Built-in rules
│ └── custom/ # Domain-specific rules
├── runner/
│ └── execute.py # Task execution
└── report/
└── generate.py # HTML report generation
pytest # Run tests
pytest --cov=egon_validation # With coverage
black egon_validation/ # Format
flake8 egon_validation/ # LintAGPL-3.0 - see LICENSE