Skip to content

THAMIZH-ARASU/Solar-Panel-Efficiency-Prediction-MLOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ Solar Power System Analyzer - MLOps

A robust, modular, and production-ready platform for solar power system data analysis, machine learning, and prediction. Built with ZenML, Streamlit, MLflow, and a rich Python data science stack, this project enables end-to-end workflows from data ingestion and EDA to model training, deployment, and inferenceβ€”all with experiment tracking and a user-friendly web interface.


πŸš€ Features

1. End-to-End ML Pipelines with ZenML

  • Training Pipeline: Data ingestion, missing value handling, feature engineering, outlier detection, model training, evaluation, and model saving.
  • Deployment Pipeline: Loads trained models, processes new data, and generates predictions for deployment scenarios.
  • Inference Pipeline: Dynamically loads models and produces predictions on new data, supporting batch inference.

2. Modular ZenML Steps

  • Data Ingestion: Supports CSV and ZIP files, with extensible ingestion logic.
  • Missing Value Handling: Multiple strategies (drop, mean, median, mode, constant, KNN, CatBoost, categorical fill).
  • Feature Engineering: Categorical encoding, new feature creation (e.g., power, area).
  • Outlier Detection: Z-score and IQR-based methods.
  • Model Building: Supports Linear Regression, Random Forest, XGBoost, CatBoost.
  • Model Evaluation: Regression metrics with MLflow logging.
  • Model Saving/Loading: Robust serialization and MLflow model registry integration.
  • Prediction & Saving: Batch predictions and artifact logging.

3. Interactive Streamlit Web App

  • EDA Tab: Upload CSVs and perform:
    • Missing value analysis (counts, heatmaps)
    • Univariate, bivariate, and multivariate analysis (histograms, boxplots, scatterplots, correlation heatmaps, pairplots)
  • Prediction Tab: Upload data, run the full inference pipeline, and download predictions.
  • Home & About Tabs: Project overview and author information.
  • Production-Grade Logging: All user actions, errors, and pipeline events are logged for traceability.

4. Advanced EDA Utilities

  • Modular EDA code in Analysis/AnalyzeSrc/ for:
    • Univariate, bivariate, and multivariate analysis
    • Missing value visualization
    • Encoding strategies and data inspection

5. Experiment Tracking with MLflow

  • All model metrics, artifacts, and predictions are logged and tracked for reproducibility and comparison.

6. Centralized Configuration

  • All pipeline and model parameters are managed via a single config.py dataclass for easy customization.

7. Extensible and Production-Ready

  • Modular codebase with clear separation of concerns.
  • Logging in every process (steps, pipelines, app).
  • Ready for cloud or on-prem deployment.

πŸ› οΈ Frameworks & Libraries

  • ZenML: Orchestrates modular, reproducible ML pipelines.
  • Streamlit: Interactive web UI for EDA and prediction.
  • MLflow: Experiment tracking, model registry, and artifact logging.
  • scikit-learn, XGBoost, CatBoost, LightGBM: Model training and evaluation.
  • pandas, numpy, matplotlib, seaborn, plotly: Data manipulation and visualization.
  • joblib: Model serialization.
  • colorama: Terminal color support.
  • Other: Jupyter, Pillow, python-dateutil, threadpoolctl, etc.

See requirements.txt for the full list.


πŸ“ Project Structure

.
β”œβ”€β”€ App/                  # Streamlit web app (EDA, prediction, UI)
β”‚   β”œβ”€β”€ app.py
β”‚   β”œβ”€β”€ eda.py
β”‚   └── predict.py
β”œβ”€β”€ Steps/                # ZenML pipeline steps (modular ML logic)
β”‚   β”œβ”€β”€ DataIngestionStep.py
β”‚   β”œβ”€β”€ FeatureEngineeringStep.py
β”‚   β”œβ”€β”€ HandleMissingValueStep.py
β”‚   β”œβ”€β”€ OutlierDetectionStep.py
β”‚   β”œβ”€β”€ ModelBuildingStep.py
β”‚   β”œβ”€β”€ ModelEvaluationStep.py
β”‚   β”œβ”€β”€ ModelSaverStep.py
β”‚   β”œβ”€β”€ ModelLoaderStep.py
β”‚   β”œβ”€β”€ PredictionStep.py
β”‚   β”œβ”€β”€ PredictionsSaverStep.py
β”‚   β”œβ”€β”€ DynamicModelLoaderStep.py
β”‚   └── SplitFeaturesTargetStep.py
β”œβ”€β”€ Pipelines/            # ZenML pipeline orchestrations
β”‚   β”œβ”€β”€ TrainingPipeline.py
β”‚   β”œβ”€β”€ InferencePipeline.py
β”‚   └── DeploymentPipeline.py
β”œβ”€β”€ Src/                  # Core ML/data logic (feature engineering, ingestion, etc.)
β”œβ”€β”€ Analysis/AnalyzeSrc/  # Advanced EDA utilities
β”œβ”€β”€ config.py             # Centralized configuration (SystemConfig)
β”œβ”€β”€ run_training.py       # Script to run the training pipeline
β”œβ”€β”€ run_inference.py      # Script to run the inference pipeline
β”œβ”€β”€ run_deployment.py     # Script to run the deployment pipeline
β”œβ”€β”€ requirements.txt
└── README.md

⚑ Quickstart

1. Install Requirements

pip install -r requirements.txt

2. Train a Model

python run_training.py
  • Uses parameters from config.py.
  • Trains and saves a model to artifacts/model.joblib.

3. Run Inference

python run_inference.py --data_path path/to/input.csv --feature_columns col1,col2,...
  • Produces predictions in artifacts/predictions.csv.

4. Run Deployment Pipeline

python run_deployment.py
  • Loads a trained model and generates predictions on test data.

5. Launch the Streamlit App

cd App
streamlit run app.py
  • Explore EDA and make predictions via the web UI.

πŸ“Š Streamlit App Features

  • EDA Tab: Upload data, visualize missing values, distributions, relationships, and correlations.
  • Prediction Tab: Upload new data, run the full inference pipeline, and download results.
  • Logging: All actions and errors are logged to App/app.log for easy debugging.

🧩 Customization

  • Pipeline Parameters: Edit config.py to change data paths, model types, feature columns, and more.
  • Add Steps: Extend the Steps/ directory with new ZenML steps for custom logic.
  • EDA: Add or modify EDA modules in Analysis/AnalyzeSrc/.

πŸ“ˆ Experiment Tracking

  • All model metrics, artifacts, and predictions are logged with MLflow.
  • To view MLflow UI:
    mlflow ui
    Then open http://localhost:5000 in your browser.

πŸ‘€ Visualize and Manage Pipelines with ZenML Dashboard

ZenML provides a built-in dashboard to visualize, monitor, and manage your pipelines, steps, and artifacts.

To launch the ZenML dashboard, simply run:

zenml up

This will start the ZenML dashboard locally. Open your browser and go to http://localhost:8237 to:

  • View all pipeline runs and their statuses
  • Inspect step outputs, artifacts, and logs
  • Monitor experiment lineage and metadata
  • Manage stacks, orchestrators, and more

The dashboard is a powerful tool for tracking your ML workflow and debugging pipeline executions.


πŸ“ Logging

  • App logs: App/app.log
  • Pipeline logs: Pipelines/pipeline.log
  • Step logs: Steps/step.log
  • All major actions, transitions, and errors are traceable for robust monitoring and debugging.

πŸ—οΈ Design Patterns & Best Practices

Design Patterns Used

  • Strategy Pattern: Used extensively for modularizing logic in data ingestion, missing value handling, feature engineering, outlier detection, model building, and EDA. This allows easy swapping and extension of algorithms and behaviors at runtime.
  • Factory Pattern: Used for creating data ingestors based on file type, enabling scalable and maintainable data ingestion logic.
  • Modularization & Separation of Concerns: The codebase is organized into clear modules (Steps, Pipelines, Src, Analysis) to ensure each component has a single responsibility and can be developed, tested, and maintained independently.

Best Practices Followed

  • Production-Grade Logging: All major actions, errors, and pipeline events are logged for traceability and debugging.
  • Centralized Configuration: All parameters and settings are managed via a single config file for easy customization and reproducibility.
  • Experiment Tracking: All model metrics, artifacts, and predictions are logged with MLflow for reproducibility and comparison.
  • Extensibility: The use of abstract base classes and modular steps makes it easy to add new features or algorithms.
  • Reproducibility: Pipelines, experiment tracking, and configuration management ensure results can be reliably reproduced.
  • Clear Documentation: The project is well-documented with docstrings, comments, and this README.

πŸ–ΌοΈ Images

Below are key images illustrating the system's architecture, pipelines, and results:

deployment_pipeline Deployment Pipeline

EDA1 Exploratory Data Analysis 1

EDA2 Exploratory Data Analysis 2

EDA3 Exploratory Data Analysis 3

inference_pipeline Inference Pipeline

mlflow_experiment_tracking MLflow Experiment Tracking

predictions Predictions Output

running_inference Running Inference

train_pipeline Training Pipeline


🀝 Contributing

Contributions are welcome! Please open issues or pull requests for improvements, bug fixes, or new features.


πŸ‘€ Author

THAMIZHARASU SARAVANAN
GitHub Profile


πŸ“„ License

This project is licensed under MIT License


Enjoy robust, modular, and production-ready solar data analysis and ML!

About

A robust, modular, and production-ready platform for solar power system data analysis, machine learning, and prediction. Built with ZenML, Streamlit, MLflow, and a rich Python data science stack, this project enables end-to-end workflows from data ingestion and EDA to model training, deployment, inference and experiment tracking

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors