Releases
v0.2.0
v0.2.0 - Major Pipeline Optimization and Infrastructure Improvements
Latest
Compare
Sorry, something went wrong.
No results found
Major Changes
Performance Optimizations
Migrate from NetworkX to rustworkx for 5-10x faster graph operations
Replace shapefiles/GeoJSON with GeoParquet for 2-5x faster I/O operations
Replace CSV with Parquet for tabular data storage
Implement single Dijkstra algorithm (replaces multiple ego_graph calls)
Add bounded Dijkstra algorithm for efficient walk time computation with distance bounding
Infrastructure Improvements
DVC to Git LFS migration and automated data update system
Automated data source updates with version checking and metadata tracking
Data validation scripts for schema and data quality checks
Graph conversion caching for improved performance
Testing & Development
Comprehensive testing framework with pytest and pytest-cov
Enhanced project dependencies including statsmodels for statistical analysis
Parallel processing support for walk time calculations
Migration scripts for converting existing data files to new formats
Documentation
Data dictionary (DATA_DICTIONARY.md) with comprehensive details on data files and workflows
Project backlog (BACKLOG.md) for tracking technical debt and feature requests
CEJST workflow documentation (README_CEJST.md)
H3 implementation details and Census API key setup instructions
Enhanced README with updated guidance
Pipeline Enhancements
Enhanced pipeline scripts (run_pipeline.sh, run_pipeline.py) with logging and error handling
Jupyter notebooks for walk times and merging analysis
Updated validation to support Parquet files
Improved code organization and consistency across modules
You can’t perform that action at this time.