zenml-io
diff --git a/‎native-experiment-tracking/README.md‎
Lines changed: 84 additions & 10 deletions b/‎native-experiment-tracking/README.md‎
Lines changed: 84 additions & 10 deletions
diff --git a/‎native-experiment-tracking/analyze.py‎
Lines changed: 1 addition & 3 deletions b/‎native-experiment-tracking/analyze.py‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎native-experiment-tracking/assets/2d_plot.png‎
54.1 KB b/‎native-experiment-tracking/assets/2d_plot.png‎
54.1 KB
diff --git a/‎native-experiment-tracking/assets/3d_plot.png‎
106 KB b/‎native-experiment-tracking/assets/3d_plot.png‎
106 KB
diff --git a/‎native-experiment-tracking/assets/cm_visualization.png‎
90.8 KB b/‎native-experiment-tracking/assets/cm_visualization.png‎
90.8 KB
diff --git a/‎native-experiment-tracking/assets/model_versions.png‎
126 KB b/‎native-experiment-tracking/assets/model_versions.png‎
126 KB
diff --git a/‎native-experiment-tracking/assets/pipeline_dag_caching.png‎
33.9 KB b/‎native-experiment-tracking/assets/pipeline_dag_caching.png‎
33.9 KB
diff --git a/‎native-experiment-tracking/requirements.txt‎
Lines changed: 1 addition & 0 deletions b/‎native-experiment-tracking/requirements.txt‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎native-experiment-tracking/run.py‎
Lines changed: 34 additions & 16 deletions b/‎native-experiment-tracking/run.py‎
Lines changed: 34 additions & 16 deletions
diff --git a/‎native-experiment-tracking/steps/__init__.py‎
Lines changed: 0 additions & 3 deletions b/‎native-experiment-tracking/steps/__init__.py‎
Lines changed: 0 additions & 3 deletions
@@ -28,15 +28,6 @@ By running this pipeline iteratively
 
 ## :running: Run locally
 
-
-#### Option 1 - Interactively explore the quickstart using Jupyter Notebook:
-```bash
-pip install notebook
-jupyter notebook
-# open quickstart.ipynb
-```
-
-#### Option 2 - Execute the whole ML pipeline from a Python script:
 ```bash
 # Pip install all requirements
 pip install -r requirements.txt
@@ -46,11 +37,94 @@ zenml integration install sklearn pandas -y
 
 # Initialize ZenML
 zenml init
+
+# Connect to your ZenML server
+zenml connect --url ...
+
+python run.py --parallel
+```
+
+This will run a grid search across the following parameter space:
+
+```python
+alpha_values = [0.0001, 0.001, 0.01]
+penalties = ["l2", "l1", "elasticnet"]
+losses = ["hinge", "squared_hinge", "modified_huber"]
+```
+
+If you choose to include the `--parallel` flag, this should all run in parallel. 
+As ZenML smartly caches across pipelines, and because the feature pipeline has run 
+ahead of the parallel training runs, all training pipelines should start on the
+`model_trainer` step.
+![Pipeline DAG with cached steps](./assets/pipeline_dag_caching.png)
+
+After running, you now should have 27 runs of the model training with 27
+produced model_versions. In case you are running with [ZenML Pro](https://docs.zenml.io/getting-started/zenml-pro)
+you'll now be able to inspect these models in the dashboard:
+![Model Versions Page](./assets/model_versions.png)
+
+Additionally, in case you ran with a remote [Data backend](https://docs.zenml.io/stack-components/artifact-stores),
+you'll be able to inspect the confusion matrix for any specific training directly in the
+frontend.
+![Confusion Matrix Visualization](./assets/cm_visualization.png)
+
+In case you want to create your own visualization, check out the implementation
+at `native-experiment-tracking/steps/model_trainer.py:generate_cm`. Basically, just create a 
+matplotlib plot, convert it into a `PIL.Image` and return it from your
+step. Don't forget to annotate your [step output accordingly](https://docs.zenml.io/how-to/build-pipelines/step-output-typing-and-annotation.
+```python
+from typing import Tuple
+from typing_extensions import Annotated
+from PIL import Image
+from zenml import ArtifactConfig, step
+
+@step
+def func(...) -> Tuple[
+    Annotated[
+        ...
+    ],
+    Annotated[
+        Image.Image, "confusion_matrix"
+    ]
+]:
 ```
 
 ## 📈 Explore your experiments
 
-...
+Once all pipelines ran, it is time to analyze our experiment.
+For this we have written an analyze.py script.
+```commandline
+python analyze.py
+```
+This will generate 2 plots for you:
+
+**3D Plot**
+![3D Plot](./assets/3d_plot.png)
+
+**2D Plot**
+![2D Plot](./assets/2d_plot.png)
+
+Feel free to use this file as a starting point to write your very own
+analysis. 
+
+## The moral of the story
+
+So what's the point? We at ZenML believe that any good experiment should be set up in a
+repeatable, scalable way while storing all the relevant metadata in order to analyze the experiment 
+after the fact. This project shows how you could do this with ZenML. 
+
+Once you have accomplished this on a toy dataset with a tiny SGDClassifier, you can start 
+scaling up in all dimensions: data, parameters, model, etc... And all of this while staying infrastructure 
+agnostic. So when your experiment outgrows your local machine, you can simply move 
+to the stack of your choice ...
+
+## 🤝 Contributing
+
+Contributions to improve the pipeline are welcome! Please feel free to submit a Pull Request.
+
+## 📄 License
+
+This project is licensed under the Apache License 2.0. See the LICENSE file for details.
 
 
 
 
@@ -9,7 +9,7 @@
 def main():
     client = Client()
 
-    model_versions = client.list_model_versions(model_name_or_id="breast_cancer_classifier", size=30, hydrate=True)
+    model_versions = client.list_model_versions(model_name_or_id="breast_cancer_classifier", size=27, hydrate=True)
 
     alpha_values = []
     losses = []
@@ -41,8 +41,6 @@ def generate_2d_plots(alpha_values, losses, penalties, test_accuracies):
 
     # Get unique values
     unique_penalties = df['Penalty'].unique()
-    unique_losses = df['Loss'].unique()
-    unique_alphas = sorted(df['Alpha'].unique())
 
     # Create a figure with subplots for each penalty
     fig, axes = plt.subplots(1, len(unique_penalties), figsize=(20, 6), sharey=True)
 
@@ -6,3 +6,4 @@ pandas
 pillow
 matplotlib
 numpy
+seaborn
@@ -21,6 +21,7 @@
 from itertools import product
 
 import click
+from sklearn.utils._param_validation import InvalidParameterError
 from zenml import Model
 from zenml.client import Client
 from zenml.logger import get_logger
@@ -36,9 +37,22 @@
     default=False,
     help="Disable caching for the pipeline run.",
 )
+@click.option(
+    "--parallel",
+    is_flag=True,
+    default=False,
+    help="Run training across the complete parameter grid in parallel.",
+)
+@click.option(
+    "--single_run",
+    is_flag=True,
+    default=False,
+    help="Run only one permutation of parameters.",
+)
 def main(
     no_cache: bool = False,
-    parallel: bool = False
+    parallel: bool = False,
+    single_run: bool = False
 ):
     """Main entry point for the pipeline execution.
 
@@ -52,8 +66,8 @@ def main(
     Args:
         no_cache: If `True` cache will be disabled.
         parallel: If `True` multiprocessing will be used for running hyperparameter tuning in parallel
+        single_run: if `True` only one training run will be started
     """
-    client = Client()
     config_path = os.path.join(
         os.path.dirname(os.path.realpath(__file__)),
         "configs",
@@ -63,22 +77,26 @@ def main(
 
     # Run the feature engineering pipeline, this way all invocations within the training pipelines
     # will use the cached output from this pipeline
-    feature_engineering()
+    # feature_engineering()
 
     # Here is our set of parameters that we want to explore to find the best combination
-    alpha_values = [0.0001, 0.001] # , 0.01]
-    penalties = ["l2", "l1"] # , "elasticnet"]
-    losses = ["hinge", "squared_hinge"] #, "modified_huber"]
+    alpha_values = [0.0001, 0.001, 0.01]
+    penalties = ["l2", "l1", "elasticnet"]
+    losses = ["hinge", "squared_hinge", "modified_huber"]
 
-    # Lets loop over these
-    # Create a list of all parameter combinations
-    parameter_combinations = list(product(alpha_values, penalties, losses))
 
-    if parallel:
-        parallel_training(config_path, enable_cache, parameter_combinations)
+    if single_run:
+        train_model(alpha_values[0], penalties[0], losses[0], config_path, enable_cache)
     else:
-        for alpha_value, penalty, loss in parameter_combinations:
-            train_model(alpha_value, penalty, loss, config_path, enable_cache)
+        # Lets loop over these
+        # Create a list of all parameter combinations
+        parameter_combinations = list(product(alpha_values, penalties, losses))
+
+        if parallel:
+            parallel_training(config_path, enable_cache, parameter_combinations)
+        else:
+            for alpha_value, penalty, loss in parameter_combinations:
+                train_model(alpha_value, penalty, loss, config_path, enable_cache)
 
 
 def parallel_training(config_path, enable_cache, parameter_combinations):
@@ -110,9 +128,9 @@ def train_model(alpha_value: float, penalty: str, loss: str, config_path: str, e
         )
 
         logger.info(f"Training finished successfully for alpha: {alpha_value}, penalty: {penalty}, loss: {loss}")
-    # except ValueError:
-    #     logger.info("Pipeline run aborted!\n\n")
-    #     pass
+    except InvalidParameterError:
+        logger.info("Pipeline run aborted due to parameter mismatch!\n\n")
+        pass
     except Exception as e:
         logger.error(f"Error in training with alpha: {alpha_value}, penalty: {penalty}, loss: {loss}")
         logger.error(f"Exception: {str(e)}")
 
@@ -27,9 +27,6 @@
 from .model_evaluator import (
     model_evaluator,
 )
-from .model_promoter import (
-    model_promoter,
-)
 from .model_trainer import (
     model_trainer,
 )
-Original file line number
+Diff line change
 pillow
 matplotlib
 numpy
 +seaborn
Original file line number	Diff line number	Diff line change
`@@ -27,9 +27,6 @@`
`27`	`27`	`from .model_evaluator import (`
`28`	`28`	`model_evaluator,`
`29`	`29`	`)`
`30`		`-from .model_promoter import (`
`31`		`- model_promoter,`
`32`		`-)`
`33`	`30`	`from .model_trainer import (`
`34`	`31`	`model_trainer,`
`35`	`32`	`)`