Environmental ML Platform

A comprehensive machine learning platform for environmental data analysis, monitoring, and prediction. This platform provides end-to-end capabilities for processing environmental datasets, training ML models, and generating actionable insights through automated reporting and interactive dashboards.

Features

Machine Learning Pipeline

Complete ML pipeline implementation with data preprocessing, training, and inference
Multiple model architectures (CNN, ResNet, Multi-scale CNN) for environmental data
Automated hyperparameter tuning and model selection
MLflow integration for experiment tracking and model versioning

Data Processing Capabilities

Geospatial data preprocessing (raster, vector, point cloud)
Time series analysis and forecasting
Real-time data ingestion and processing
Advanced feature engineering and selection

Infrastructure & Operations

FastAPI-based inference service with auto-scaling
Redis caching for improved performance
Prometheus monitoring and metrics collection
Comprehensive testing framework with 95%+ coverage
CI/CD pipeline with automated testing and deployment

Analysis & Reporting

Automated report generation with visualizations
Interactive dashboards for real-time monitoring
Environmental impact assessment and trend analysis
Alert system for threshold exceedances

Project Structure

environmental-ml-platform/
├── .github/workflows/          # CI/CD pipeline configurations
├── data/                       # Data storage directories
│   ├── external/              # External data sources
│   ├── processed/             # Processed datasets
│   └── raw/                   # Raw data files
├── docs/                      # Documentation
├── infrastructure/            # Infrastructure as Code
│   ├── azure_config/         # Azure deployment configs
│   ├── docker/               # Docker configurations
│   ├── kubernetes/           # Kubernetes manifests
│   ├── monitoring/           # Monitoring setup
│   └── sql/                  # Database schemas
├── ml_pipeline/              # Core ML pipeline
│   ├── config/               # Configuration files
│   ├── data/                 # Data processing modules
│   ├── inference/            # Inference services
│   ├── models/               # Model definitions
│   ├── training/             # Training pipelines
│   └── utils/                # Utility functions
├── models/                   # Model storage
│   ├── inference/            # Inference models
│   ├── saved_models/         # Trained models
│   └── training/             # Training artifacts
├── reports/                  # Analysis reports
│   ├── data/                 # Report data
│   ├── figures/              # Generated visualizations
│   ├── master_report/        # Comprehensive reports
│   ├── metrics/              # Performance metrics
│   └── models/               # Report model files
├── tests/                    # Test suite
├── main.py                   # Main application entry
├── docker-compose.yml        # Docker Compose configuration
├── pyproject.toml           # Python project configuration
├── pytest.ini              # Pytest configuration
└── requirements-test.txt    # Testing dependencies

Quick Start

Prerequisites

Python 3.9+
UV package manager
Docker (optional)
Git

Installation

Clone the repository:

git clone <repository-url>
cd environmental-ml-platform

Set up the environment with UV:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .

Install testing dependencies:
```
uv pip install -r requirements-test.txt
```

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration

Running the Platform

Start the main application:
```
python main.py
```
Generate analysis reports:
```
python reports/generate_all_reports.py
```
Run tests:
```
pytest
```
Start with Docker:
```
docker-compose up
```

Analysis Reports

The platform automatically generates comprehensive analysis reports with visualizations:

Data Analysis Report

Comprehensive analysis of environmental datasets including:

Statistical summaries and distributions
Temporal analysis and trends
Feature correlations and relationships
Data quality assessment
Missing data patterns and handling strategies

Model Performance Report

Detailed evaluation of machine learning models:

Classification and regression performance metrics
Cross-validation results
Model comparison and selection
Training time analysis
Confusion matrices for classification tasks

Environmental Impact Report

Environmental trend analysis and impact assessment:

Trend analysis over time
Impact severity assessment
Risk evaluation and alerts
Predictive insights

Interactive Dashboard

Access the interactive dashboard at reports/master_report/index.html for:

Real-time data exploration
Interactive visualizations
Performance monitoring
Report navigation

Testing

The platform includes a comprehensive testing framework:

# Run all tests
pytest

# Run with coverage
pytest --cov=ml_pipeline --cov-report=html

# Run specific test categories
pytest -m "not slow"  # Skip slow tests
pytest -m integration  # Run integration tests only

Test Categories

Unit Tests: Fast, isolated component tests
Integration Tests: End-to-end pipeline tests
Slow Tests: Long-running performance tests
GPU Tests: GPU-accelerated model tests
MLflow Tests: Experiment tracking tests
Redis Tests: Caching system tests

Configuration

Environment Variables

Key configuration options in .env:

# Database
DATABASE_URL=postgresql://user:pass@localhost/envml

# MLflow
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=environmental-ml

# Redis
REDIS_URL=redis://localhost:6379

# API
API_HOST=0.0.0.0
API_PORT=8000

# Monitoring
PROMETHEUS_PORT=9090

Model Configuration

Configure models in ml_pipeline/config/:

model_config.yaml: Model architectures and hyperparameters
training_config.yaml: Training pipeline settings
inference_config.yaml: Inference service configuration

Deployment

Docker Deployment

# Build and run with Docker Compose
docker-compose up --build

# Scale services
docker-compose up --scale inference=3

Kubernetes Deployment

# Apply Kubernetes manifests
kubectl apply -f infrastructure/kubernetes/

# Check deployment status
kubectl get pods -n environmental-ml

Azure Deployment

Use the provided Azure Resource Manager templates:

# Deploy infrastructure
az deployment group create \
  --resource-group environmental-ml \
  --template-file infrastructure/azure_config/main.json

Monitoring

Prometheus Metrics

The platform exposes metrics for monitoring:

Model prediction latency
Data processing throughput
Error rates and success rates
Resource utilization

Grafana Dashboards

Pre-configured dashboards available in infrastructure/monitoring/:

System performance dashboard
ML model performance dashboard
Data quality monitoring dashboard

API Documentation

Once the application is running, access the API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Key Endpoints

POST /predict: Make predictions with trained models
GET /models: List available models
POST /data/upload: Upload new environmental data
GET /reports: Access generated reports
GET /health: Health check endpoint

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and add tests
Run the test suite: pytest
Commit your changes: git commit -m "Add feature"
Push to the branch: git push origin feature-name
Submit a pull request

Development Guidelines

Follow PEP 8 style guidelines
Add tests for new functionality
Update documentation as needed
Use meaningful commit messages
No emojis in commits or code

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support and questions:

Create an issue on GitHub
Check the documentation in the docs/ directory
Review the analysis reports for insights

Acknowledgments

Built with modern ML frameworks and best practices
Designed for scalability and production deployment
Comprehensive testing and monitoring included
Professional reporting and visualization capabilities

Environmental ML Platform - Empowering environmental decision-making through advanced machine learning and data analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
infrastructure		infrastructure
ml_pipeline		ml_pipeline
reports		reports
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-test.txt		requirements-test.txt
uv.lock		uv.lock

digomes87/environmental-ml-platform

Folders and files

Latest commit

History

Repository files navigation