|
| 1 | +# 🌸 Iris Classification MLOps Pipeline with ZenML |
| 2 | + |
| 3 | +Welcome to the Iris Classification MLOps project! This project demonstrates how to build a production-ready machine learning pipeline using ZenML. It showcases various MLOps practices including data preparation, model training, evaluation, explainability, and data drift detection. |
| 4 | + |
| 5 | +## 🌟 Features |
| 6 | + |
| 7 | +- Data loading and splitting using scikit-learn's iris dataset |
| 8 | +- SVM model training with hyperparameter configuration |
| 9 | +- Model evaluation with accuracy metrics |
| 10 | +- Model explainability using SHAP (SHapley Additive exPlanations) |
| 11 | +- Data drift detection between training and test sets |
| 12 | +- Artifact and metadata logging for enhanced traceability |
| 13 | + |
| 14 | +<div align="center"> |
| 15 | + <br/> |
| 16 | + <img alt="Iris Classification Pipeline" src=".assets/model.gif" width="70%"> |
| 17 | + <br/> |
| 18 | +</div> |
| 19 | + |
| 20 | +## 🏃 How to Run |
| 21 | + |
| 22 | +Before running the pipeline, set up your environment as follows: |
| 23 | + |
| 24 | +```bash |
| 25 | +# Set up a Python virtual environment |
| 26 | +python3 -m venv .venv |
| 27 | +source .venv/bin/activate |
| 28 | + |
| 29 | +# Install requirements |
| 30 | +pip install -r requirements.txt |
| 31 | +``` |
| 32 | + |
| 33 | +To run the Iris Classification pipeline: |
| 34 | + |
| 35 | +```shell |
| 36 | +python run.py |
| 37 | +``` |
| 38 | + |
| 39 | +## 🧩 Pipeline Steps |
| 40 | + |
| 41 | +1. **Load Data**: Loads the iris dataset and splits it into train and test sets. |
| 42 | +2. **Train Model**: Trains an SVM classifier on the training data. |
| 43 | +3. **Evaluate Model**: Evaluates the model on the test set and generates predictions. |
| 44 | +4. **Explain Model**: Generates SHAP values for model explainability. |
| 45 | +5. **Detect Data Drift**: Detects potential data drift between training and test sets. |
| 46 | + |
| 47 | +## 📊 Visualizations |
| 48 | + |
| 49 | +The pipeline generates a SHAP summary plot to explain feature importance: |
| 50 | + |
| 51 | +<div align="center"> |
| 52 | + <br/> |
| 53 | + <img alt="SHAP Summary Plot" src=".assets/shap_visualization.png" width="70%"> |
| 54 | + <br/> |
| 55 | +</div> |
| 56 | + |
| 57 | +## 🛠️ Customization |
| 58 | + |
| 59 | +You can customize various aspects of the pipeline: |
| 60 | + |
| 61 | +- Adjust the `SVC` hyperparameters in the `train_model` step |
| 62 | +- Modify the train-test split ratio in the `load_data` step |
| 63 | +- Add or remove features from the iris dataset |
| 64 | +- Implement additional evaluation metrics in the `evaluate_model` step |
| 65 | + |
| 66 | +## 📜 Project Structure |
| 67 | + |
| 68 | +``` |
| 69 | +. |
| 70 | +├── run.py # Main pipeline file |
| 71 | +├── requirements.txt # Python dependencies |
| 72 | +└── README.md # This file |
| 73 | +``` |
| 74 | + |
| 75 | +## 🤝 Contributing |
| 76 | + |
| 77 | +Contributions to improve the pipeline are welcome! Please feel free to submit a Pull Request. |
| 78 | + |
| 79 | +## 📄 License |
| 80 | + |
| 81 | +This project is licensed under the Apache License 2.0. See the LICENSE file for details. |
0 commit comments