Skip to content

Commit db7a36c

Browse files
authored
Merge pull request #126 from zenml-io/project/shap-values
Added new project for explainability
2 parents 7255650 + 5322762 commit db7a36c

File tree

8 files changed

+771
-0
lines changed

8 files changed

+771
-0
lines changed
2.98 MB
Loading
50.7 KB
Loading

explainability-shap/.dockerignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
!/materializers/**
2+
!/pipelines/**
3+
!/steps/**
4+
!/utils/**

explainability-shap/LICENSE

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Apache Software License 2.0
2+
3+
Copyright (c) ZenML GmbH 2024. All rights reserved.
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.

explainability-shap/README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# 🌸 Iris Classification MLOps Pipeline with ZenML
2+
3+
Welcome to the Iris Classification MLOps project! This project demonstrates how to build a production-ready machine learning pipeline using ZenML. It showcases various MLOps practices including data preparation, model training, evaluation, explainability, and data drift detection.
4+
5+
## 🌟 Features
6+
7+
- Data loading and splitting using scikit-learn's iris dataset
8+
- SVM model training with hyperparameter configuration
9+
- Model evaluation with accuracy metrics
10+
- Model explainability using SHAP (SHapley Additive exPlanations)
11+
- Data drift detection between training and test sets
12+
- Artifact and metadata logging for enhanced traceability
13+
14+
<div align="center">
15+
<br/>
16+
<img alt="Iris Classification Pipeline" src=".assets/model.gif" width="70%">
17+
<br/>
18+
</div>
19+
20+
## 🏃 How to Run
21+
22+
Before running the pipeline, set up your environment as follows:
23+
24+
```bash
25+
# Set up a Python virtual environment
26+
python3 -m venv .venv
27+
source .venv/bin/activate
28+
29+
# Install requirements
30+
pip install -r requirements.txt
31+
```
32+
33+
To run the Iris Classification pipeline:
34+
35+
```shell
36+
python run.py
37+
```
38+
39+
## 🧩 Pipeline Steps
40+
41+
1. **Load Data**: Loads the iris dataset and splits it into train and test sets.
42+
2. **Train Model**: Trains an SVM classifier on the training data.
43+
3. **Evaluate Model**: Evaluates the model on the test set and generates predictions.
44+
4. **Explain Model**: Generates SHAP values for model explainability.
45+
5. **Detect Data Drift**: Detects potential data drift between training and test sets.
46+
47+
## 📊 Visualizations
48+
49+
The pipeline generates a SHAP summary plot to explain feature importance:
50+
51+
<div align="center">
52+
<br/>
53+
<img alt="SHAP Summary Plot" src=".assets/shap_visualization.png" width="70%">
54+
<br/>
55+
</div>
56+
57+
## 🛠️ Customization
58+
59+
You can customize various aspects of the pipeline:
60+
61+
- Adjust the `SVC` hyperparameters in the `train_model` step
62+
- Modify the train-test split ratio in the `load_data` step
63+
- Add or remove features from the iris dataset
64+
- Implement additional evaluation metrics in the `evaluate_model` step
65+
66+
## 📜 Project Structure
67+
68+
```
69+
.
70+
├── run.py # Main pipeline file
71+
├── requirements.txt # Python dependencies
72+
└── README.md # This file
73+
```
74+
75+
## 🤝 Contributing
76+
77+
Contributions to improve the pipeline are welcome! Please feel free to submit a Pull Request.
78+
79+
## 📄 License
80+
81+
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
scikit-learn
2+
shap
3+
matplotlib
4+
scipy
5+
zenml
6+
pyarrow
7+
fastparquet

0 commit comments

Comments
 (0)