Skip to content

Commit 316dc60

Browse files
committed
Finished the story
1 parent b837c6f commit 316dc60

File tree

12 files changed

+133
-37
lines changed

12 files changed

+133
-37
lines changed

native-experiment-tracking/README.md

Lines changed: 84 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -28,15 +28,6 @@ By running this pipeline iteratively
2828

2929
## :running: Run locally
3030

31-
32-
#### Option 1 - Interactively explore the quickstart using Jupyter Notebook:
33-
```bash
34-
pip install notebook
35-
jupyter notebook
36-
# open quickstart.ipynb
37-
```
38-
39-
#### Option 2 - Execute the whole ML pipeline from a Python script:
4031
```bash
4132
# Pip install all requirements
4233
pip install -r requirements.txt
@@ -46,11 +37,94 @@ zenml integration install sklearn pandas -y
4637

4738
# Initialize ZenML
4839
zenml init
40+
41+
# Connect to your ZenML server
42+
zenml connect --url ...
43+
44+
python run.py --parallel
45+
```
46+
47+
This will run a grid search across the following parameter space:
48+
49+
```python
50+
alpha_values = [0.0001, 0.001, 0.01]
51+
penalties = ["l2", "l1", "elasticnet"]
52+
losses = ["hinge", "squared_hinge", "modified_huber"]
53+
```
54+
55+
If you choose to include the `--parallel` flag, this should all run in parallel.
56+
As ZenML smartly caches across pipelines, and because the feature pipeline has run
57+
ahead of the parallel training runs, all training pipelines should start on the
58+
`model_trainer` step.
59+
![Pipeline DAG with cached steps](./assets/pipeline_dag_caching.png)
60+
61+
After running, you now should have 27 runs of the model training with 27
62+
produced model_versions. In case you are running with [ZenML Pro](https://docs.zenml.io/getting-started/zenml-pro)
63+
you'll now be able to inspect these models in the dashboard:
64+
![Model Versions Page](./assets/model_versions.png)
65+
66+
Additionally, in case you ran with a remote [Data backend](https://docs.zenml.io/stack-components/artifact-stores),
67+
you'll be able to inspect the confusion matrix for any specific training directly in the
68+
frontend.
69+
![Confusion Matrix Visualization](./assets/cm_visualization.png)
70+
71+
In case you want to create your own visualization, check out the implementation
72+
at `native-experiment-tracking/steps/model_trainer.py:generate_cm`. Basically, just create a
73+
matplotlib plot, convert it into a `PIL.Image` and return it from your
74+
step. Don't forget to annotate your [step output accordingly](https://docs.zenml.io/how-to/build-pipelines/step-output-typing-and-annotation.
75+
```python
76+
from typing import Tuple
77+
from typing_extensions import Annotated
78+
from PIL import Image
79+
from zenml import ArtifactConfig, step
80+
81+
@step
82+
def func(...) -> Tuple[
83+
Annotated[
84+
...
85+
],
86+
Annotated[
87+
Image.Image, "confusion_matrix"
88+
]
89+
]:
4990
```
5091

5192
## 📈 Explore your experiments
5293

53-
...
94+
Once all pipelines ran, it is time to analyze our experiment.
95+
For this we have written an analyze.py script.
96+
```commandline
97+
python analyze.py
98+
```
99+
This will generate 2 plots for you:
100+
101+
**3D Plot**
102+
![3D Plot](./assets/3d_plot.png)
103+
104+
**2D Plot**
105+
![2D Plot](./assets/2d_plot.png)
106+
107+
Feel free to use this file as a starting point to write your very own
108+
analysis.
109+
110+
## The moral of the story
111+
112+
So what's the point? We at ZenML believe that any good experiment should be set up in a
113+
repeatable, scalable way while storing all the relevant metadata in order to analyze the experiment
114+
after the fact. This project shows how you could do this with ZenML.
115+
116+
Once you have accomplished this on a toy dataset with a tiny SGDClassifier, you can start
117+
scaling up in all dimensions: data, parameters, model, etc... And all of this while staying infrastructure
118+
agnostic. So when your experiment outgrows your local machine, you can simply move
119+
to the stack of your choice ...
120+
121+
## 🤝 Contributing
122+
123+
Contributions to improve the pipeline are welcome! Please feel free to submit a Pull Request.
124+
125+
## 📄 License
126+
127+
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
54128

55129

56130

native-experiment-tracking/analyze.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
def main():
1010
client = Client()
1111

12-
model_versions = client.list_model_versions(model_name_or_id="breast_cancer_classifier", size=30, hydrate=True)
12+
model_versions = client.list_model_versions(model_name_or_id="breast_cancer_classifier", size=27, hydrate=True)
1313

1414
alpha_values = []
1515
losses = []
@@ -41,8 +41,6 @@ def generate_2d_plots(alpha_values, losses, penalties, test_accuracies):
4141

4242
# Get unique values
4343
unique_penalties = df['Penalty'].unique()
44-
unique_losses = df['Loss'].unique()
45-
unique_alphas = sorted(df['Alpha'].unique())
4644

4745
# Create a figure with subplots for each penalty
4846
fig, axes = plt.subplots(1, len(unique_penalties), figsize=(20, 6), sharey=True)
54.1 KB
Loading
106 KB
Loading
90.8 KB
Loading
126 KB
Loading
33.9 KB
Loading

native-experiment-tracking/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ pandas
66
pillow
77
matplotlib
88
numpy
9+
seaborn

native-experiment-tracking/run.py

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
from itertools import product
2222

2323
import click
24+
from sklearn.utils._param_validation import InvalidParameterError
2425
from zenml import Model
2526
from zenml.client import Client
2627
from zenml.logger import get_logger
@@ -36,9 +37,22 @@
3637
default=False,
3738
help="Disable caching for the pipeline run.",
3839
)
40+
@click.option(
41+
"--parallel",
42+
is_flag=True,
43+
default=False,
44+
help="Run training across the complete parameter grid in parallel.",
45+
)
46+
@click.option(
47+
"--single_run",
48+
is_flag=True,
49+
default=False,
50+
help="Run only one permutation of parameters.",
51+
)
3952
def main(
4053
no_cache: bool = False,
41-
parallel: bool = False
54+
parallel: bool = False,
55+
single_run: bool = False
4256
):
4357
"""Main entry point for the pipeline execution.
4458
@@ -52,8 +66,8 @@ def main(
5266
Args:
5367
no_cache: If `True` cache will be disabled.
5468
parallel: If `True` multiprocessing will be used for running hyperparameter tuning in parallel
69+
single_run: if `True` only one training run will be started
5570
"""
56-
client = Client()
5771
config_path = os.path.join(
5872
os.path.dirname(os.path.realpath(__file__)),
5973
"configs",
@@ -63,22 +77,26 @@ def main(
6377

6478
# Run the feature engineering pipeline, this way all invocations within the training pipelines
6579
# will use the cached output from this pipeline
66-
feature_engineering()
80+
# feature_engineering()
6781

6882
# Here is our set of parameters that we want to explore to find the best combination
69-
alpha_values = [0.0001, 0.001] # , 0.01]
70-
penalties = ["l2", "l1"] # , "elasticnet"]
71-
losses = ["hinge", "squared_hinge"] #, "modified_huber"]
83+
alpha_values = [0.0001, 0.001, 0.01]
84+
penalties = ["l2", "l1", "elasticnet"]
85+
losses = ["hinge", "squared_hinge", "modified_huber"]
7286

73-
# Lets loop over these
74-
# Create a list of all parameter combinations
75-
parameter_combinations = list(product(alpha_values, penalties, losses))
7687

77-
if parallel:
78-
parallel_training(config_path, enable_cache, parameter_combinations)
88+
if single_run:
89+
train_model(alpha_values[0], penalties[0], losses[0], config_path, enable_cache)
7990
else:
80-
for alpha_value, penalty, loss in parameter_combinations:
81-
train_model(alpha_value, penalty, loss, config_path, enable_cache)
91+
# Lets loop over these
92+
# Create a list of all parameter combinations
93+
parameter_combinations = list(product(alpha_values, penalties, losses))
94+
95+
if parallel:
96+
parallel_training(config_path, enable_cache, parameter_combinations)
97+
else:
98+
for alpha_value, penalty, loss in parameter_combinations:
99+
train_model(alpha_value, penalty, loss, config_path, enable_cache)
82100

83101

84102
def parallel_training(config_path, enable_cache, parameter_combinations):
@@ -110,9 +128,9 @@ def train_model(alpha_value: float, penalty: str, loss: str, config_path: str, e
110128
)
111129

112130
logger.info(f"Training finished successfully for alpha: {alpha_value}, penalty: {penalty}, loss: {loss}")
113-
# except ValueError:
114-
# logger.info("Pipeline run aborted!\n\n")
115-
# pass
131+
except InvalidParameterError:
132+
logger.info("Pipeline run aborted due to parameter mismatch!\n\n")
133+
pass
116134
except Exception as e:
117135
logger.error(f"Error in training with alpha: {alpha_value}, penalty: {penalty}, loss: {loss}")
118136
logger.error(f"Exception: {str(e)}")

native-experiment-tracking/steps/__init__.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,6 @@
2727
from .model_evaluator import (
2828
model_evaluator,
2929
)
30-
from .model_promoter import (
31-
model_promoter,
32-
)
3330
from .model_trainer import (
3431
model_trainer,
3532
)

0 commit comments

Comments
 (0)