Skip to content

digomes87/environmental-ml-platform

Repository files navigation

Environmental ML Platform

A comprehensive machine learning platform for environmental data analysis, monitoring, and prediction. This platform provides end-to-end capabilities for processing environmental datasets, training ML models, and generating actionable insights through automated reporting and interactive dashboards.

Features

Machine Learning Pipeline

  • Complete ML pipeline implementation with data preprocessing, training, and inference
  • Multiple model architectures (CNN, ResNet, Multi-scale CNN) for environmental data
  • Automated hyperparameter tuning and model selection
  • MLflow integration for experiment tracking and model versioning

Data Processing Capabilities

  • Geospatial data preprocessing (raster, vector, point cloud)
  • Time series analysis and forecasting
  • Real-time data ingestion and processing
  • Advanced feature engineering and selection

Infrastructure & Operations

  • FastAPI-based inference service with auto-scaling
  • Redis caching for improved performance
  • Prometheus monitoring and metrics collection
  • Comprehensive testing framework with 95%+ coverage
  • CI/CD pipeline with automated testing and deployment

Analysis & Reporting

  • Automated report generation with visualizations
  • Interactive dashboards for real-time monitoring
  • Environmental impact assessment and trend analysis
  • Alert system for threshold exceedances

Project Structure

environmental-ml-platform/
├── .github/workflows/          # CI/CD pipeline configurations
├── data/                       # Data storage directories
│   ├── external/              # External data sources
│   ├── processed/             # Processed datasets
│   └── raw/                   # Raw data files
├── docs/                      # Documentation
├── infrastructure/            # Infrastructure as Code
│   ├── azure_config/         # Azure deployment configs
│   ├── docker/               # Docker configurations
│   ├── kubernetes/           # Kubernetes manifests
│   ├── monitoring/           # Monitoring setup
│   └── sql/                  # Database schemas
├── ml_pipeline/              # Core ML pipeline
│   ├── config/               # Configuration files
│   ├── data/                 # Data processing modules
│   ├── inference/            # Inference services
│   ├── models/               # Model definitions
│   ├── training/             # Training pipelines
│   └── utils/                # Utility functions
├── models/                   # Model storage
│   ├── inference/            # Inference models
│   ├── saved_models/         # Trained models
│   └── training/             # Training artifacts
├── reports/                  # Analysis reports
│   ├── data/                 # Report data
│   ├── figures/              # Generated visualizations
│   ├── master_report/        # Comprehensive reports
│   ├── metrics/              # Performance metrics
│   └── models/               # Report model files
├── tests/                    # Test suite
├── main.py                   # Main application entry
├── docker-compose.yml        # Docker Compose configuration
├── pyproject.toml           # Python project configuration
├── pytest.ini              # Pytest configuration
└── requirements-test.txt    # Testing dependencies

Quick Start

Prerequisites

  • Python 3.9+
  • UV package manager
  • Docker (optional)
  • Git

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd environmental-ml-platform
  2. Set up the environment with UV:

    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    uv pip install -e .
  3. Install testing dependencies:

    uv pip install -r requirements-test.txt
  4. Set up environment variables:

    cp .env.example .env
    # Edit .env with your configuration

Running the Platform

  1. Start the main application:

    python main.py
  2. Generate analysis reports:

    python reports/generate_all_reports.py
  3. Run tests:

    pytest
  4. Start with Docker:

    docker-compose up

Analysis Reports

The platform automatically generates comprehensive analysis reports with visualizations:

Data Analysis Report

Data Overview Missing Data Analysis Feature Analysis Temporal Analysis

Comprehensive analysis of environmental datasets including:

  • Statistical summaries and distributions
  • Temporal analysis and trends
  • Feature correlations and relationships
  • Data quality assessment
  • Missing data patterns and handling strategies

Model Performance Report

Classification Performance Regression Performance Confusion Matrices

Detailed evaluation of machine learning models:

  • Classification and regression performance metrics
  • Cross-validation results
  • Model comparison and selection
  • Training time analysis
  • Confusion matrices for classification tasks

Environmental Impact Report

Environmental trend analysis and impact assessment:

  • Trend analysis over time
  • Impact severity assessment
  • Risk evaluation and alerts
  • Predictive insights

Interactive Dashboard

Access the interactive dashboard at reports/master_report/index.html for:

  • Real-time data exploration
  • Interactive visualizations
  • Performance monitoring
  • Report navigation

Testing

The platform includes a comprehensive testing framework:

# Run all tests
pytest

# Run with coverage
pytest --cov=ml_pipeline --cov-report=html

# Run specific test categories
pytest -m "not slow"  # Skip slow tests
pytest -m integration  # Run integration tests only

Test Categories

  • Unit Tests: Fast, isolated component tests
  • Integration Tests: End-to-end pipeline tests
  • Slow Tests: Long-running performance tests
  • GPU Tests: GPU-accelerated model tests
  • MLflow Tests: Experiment tracking tests
  • Redis Tests: Caching system tests

Configuration

Environment Variables

Key configuration options in .env:

# Database
DATABASE_URL=postgresql://user:pass@localhost/envml

# MLflow
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=environmental-ml

# Redis
REDIS_URL=redis://localhost:6379

# API
API_HOST=0.0.0.0
API_PORT=8000

# Monitoring
PROMETHEUS_PORT=9090

Model Configuration

Configure models in ml_pipeline/config/:

  • model_config.yaml: Model architectures and hyperparameters
  • training_config.yaml: Training pipeline settings
  • inference_config.yaml: Inference service configuration

Deployment

Docker Deployment

# Build and run with Docker Compose
docker-compose up --build

# Scale services
docker-compose up --scale inference=3

Kubernetes Deployment

# Apply Kubernetes manifests
kubectl apply -f infrastructure/kubernetes/

# Check deployment status
kubectl get pods -n environmental-ml

Azure Deployment

Use the provided Azure Resource Manager templates:

# Deploy infrastructure
az deployment group create \
  --resource-group environmental-ml \
  --template-file infrastructure/azure_config/main.json

Monitoring

Prometheus Metrics

The platform exposes metrics for monitoring:

  • Model prediction latency
  • Data processing throughput
  • Error rates and success rates
  • Resource utilization

Grafana Dashboards

Pre-configured dashboards available in infrastructure/monitoring/:

  • System performance dashboard
  • ML model performance dashboard
  • Data quality monitoring dashboard

API Documentation

Once the application is running, access the API documentation at:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Key Endpoints

  • POST /predict: Make predictions with trained models
  • GET /models: List available models
  • POST /data/upload: Upload new environmental data
  • GET /reports: Access generated reports
  • GET /health: Health check endpoint

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and add tests
  4. Run the test suite: pytest
  5. Commit your changes: git commit -m "Add feature"
  6. Push to the branch: git push origin feature-name
  7. Submit a pull request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add tests for new functionality
  • Update documentation as needed
  • Use meaningful commit messages
  • No emojis in commits or code

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support and questions:

  • Create an issue on GitHub
  • Check the documentation in the docs/ directory
  • Review the analysis reports for insights

Acknowledgments

  • Built with modern ML frameworks and best practices
  • Designed for scalability and production deployment
  • Comprehensive testing and monitoring included
  • Professional reporting and visualization capabilities

Environmental ML Platform - Empowering environmental decision-making through advanced machine learning and data analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published