Skip to content

Commit 21926ed

Browse files
authored
Merge branch 'main' into refactor-deprecate-merge
2 parents 679257a + 3268ae9 commit 21926ed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1214
-558
lines changed

classifier-e2e/README.md

Lines changed: 59 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -11,58 +11,76 @@ pinned: false
1111
license: apache-2.0
1212
---
1313

14-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14+
# ZenML MLOps Breast Cancer Classification Demo
1515

16-
# 📜 ZenML Stack Show Case
16+
## 🌍 Project Overview
1717

18-
This project aims to demonstrate the power of stacks. The code in this
19-
project assumes that you have quite a few stacks registered already.
18+
This is a minimalistic MLOps project demonstrating how to put machine learning
19+
workflows into production using ZenML. The project focuses on building a breast
20+
cancer classification model with end-to-end ML pipeline management.
2021

21-
## default
22-
* `default` Orchestrator
23-
* `default` Artifact Store
22+
### Key Features
2423

25-
```commandline
26-
zenml stack set default
27-
python run.py --training-pipeline
24+
- 🔬 Feature engineering pipeline
25+
- 🤖 Model training pipeline
26+
- 🧪 Batch inference pipeline
27+
- 📊 Artifact and model lineage tracking
28+
- 🔗 Integration with Weights & Biases for experiment tracking
29+
30+
## 🚀 Installation
31+
32+
1. Clone the repository
33+
2. Install requirements:
34+
```bash
35+
pip install -r requirements.txt
36+
```
37+
3. Install ZenML integrations:
38+
```bash
39+
zenml integration install sklearn xgboost wandb -y
40+
zenml login
41+
zenml init
42+
```
43+
4. You need to register a stack with a [Weights & Biases Experiment Tracker](https://docs.zenml.io/stack-components/experiment-trackers/wandb).
44+
45+
## 🧠 Project Structure
46+
47+
- `steps/`: Contains individual pipeline steps
48+
- `pipelines/`: Pipeline definitions
49+
- `run.py`: Main script to execute pipelines
50+
51+
## 🔍 Workflow and Execution
52+
53+
First, you need to set your stack:
54+
55+
```bash
56+
zenml stack set stack-with-wandb
2857
```
2958

30-
## local-sagemaker-step-operator-stack
31-
* `default` Orchestrator
32-
* `s3` Artifact Store
33-
* `local` Image Builder
34-
* `aws` Container Registry
35-
* `Sagemaker` Step Operator
59+
### 1. Data Loading and Feature Engineering
3660

37-
```commandline
38-
zenml stack set local-sagemaker-step-operator-stack
39-
zenml integration install aws -y
40-
python run.py --training-pipeline
61+
- Uses the Breast Cancer dataset from scikit-learn
62+
- Splits data into training and inference sets
63+
- Preprocesses data for model training
64+
65+
```bash
66+
python run.py --feature-pipeline
4167
```
4268

43-
## sagemaker-airflow-stack
44-
* `Airflow` Orchestrator
45-
* `s3` Artifact Store
46-
* `local` Image Builder
47-
* `aws` Container Registry
48-
* `Sagemaker` Step Operator
49-
50-
```commandline
51-
zenml stack set sagemaker-airflow-stack
52-
zenml integration install airflow -y
53-
pip install apache-airflow-providers-docker apache-airflow~=2.5.0
54-
zenml stack up
69+
### 2. Model Training
70+
71+
- Supports multiple model types (SGD, XGBoost)
72+
- Evaluates and compares model performance
73+
- Tracks model metrics with Weights & Biases
74+
75+
```bash
5576
python run.py --training-pipeline
5677
```
5778

58-
## sagemaker-stack
59-
* `Sagemaker` Orchestrator
60-
* `s3` Artifact Store
61-
* `local` Image Builder
62-
* `aws` Container Registry
63-
* `Sagemaker` Step Operator
79+
### 3. Batch Inference
6480

65-
```commandline
66-
zenml stack set sagemaker-stack
67-
python run.py --training-pipeline
81+
- Loads production model
82+
- Generates predictions on new data
83+
84+
```bash
85+
python run.py --inference-pipeline
6886
```

classifier-e2e/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
zenml[server]>=0.55.2
1+
zenml[server]>=0.70.0
22
notebook
33
scikit-learn<1.3
44
s3fs>2022.3.0,<=2023.4.0

classifier-e2e/run_full.ipynb

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
"source": [
3939
"! pip3 install -r requirements.txt\n",
4040
"! zenml integration install sklearn xgboost -y\n",
41-
"! zenml connect --url https://1cf18d95-zenml.cloudinfra.zenml.io \n",
41+
"! zenml login https://1cf18d95-zenml.cloudinfra.zenml.io \n",
4242
"\n",
4343
"import IPython\n",
4444
"IPython.Application.instance().kernel.do_shutdown(restart=True)"
@@ -941,10 +941,17 @@
941941
" .ravel()\n",
942942
" .tolist(),\n",
943943
" }\n",
944-
" log_model_metadata(metadata={\"wandb_url\": wandb.run.url})\n",
945-
" log_artifact_metadata(\n",
944+
"\n",
945+
" try:\n",
946+
" if get_step_context().model:\n",
947+
" log_metadata(metadata=metadata, infer_model=True)\n",
948+
" except StepContextError:\n",
949+
" # If a model is not configured, it is not able to log metadata\n",
950+
" pass\n",
951+
"\n",
952+
" log_metadata(\n",
946953
" metadata=metadata,\n",
947-
" artifact_name=\"breast_cancer_classifier\",\n",
954+
" artifact_version_id=get_step_context().inputs[\"model\"].id,\n",
948955
" )\n",
949956
"\n",
950957
" wandb.log({\"train_accuracy\": metadata[\"train_accuracy\"]})\n",
@@ -1073,6 +1080,7 @@
10731080
{
10741081
"cell_type": "code",
10751082
"execution_count": null,
1083+
"id": "1e2130b9",
10761084
"metadata": {},
10771085
"outputs": [],
10781086
"source": [
@@ -1083,6 +1091,7 @@
10831091
{
10841092
"cell_type": "code",
10851093
"execution_count": null,
1094+
"id": "476cbf5c",
10861095
"metadata": {},
10871096
"outputs": [],
10881097
"source": [
@@ -1091,6 +1100,7 @@
10911100
},
10921101
{
10931102
"cell_type": "markdown",
1103+
"id": "75df10e7",
10941104
"metadata": {},
10951105
"source": [
10961106
"Now full run executed on local stack and experiment is tracked using Model Control Plane and Weights&Biases.\n",
@@ -1103,6 +1113,7 @@
11031113
{
11041114
"cell_type": "code",
11051115
"execution_count": null,
1116+
"id": "bfd6345f",
11061117
"metadata": {},
11071118
"outputs": [],
11081119
"source": [
@@ -1113,6 +1124,7 @@
11131124
{
11141125
"cell_type": "code",
11151126
"execution_count": null,
1127+
"id": "24358031",
11161128
"metadata": {},
11171129
"outputs": [],
11181130
"source": [
@@ -1136,7 +1148,7 @@
11361148
"name": "python",
11371149
"nbconvert_exporter": "python",
11381150
"pygments_lexer": "ipython3",
1139-
"version": "3.9.18"
1151+
"version": "3.11.3"
11401152
}
11411153
},
11421154
"nbformat": 4,

classifier-e2e/run_skip_basics.ipynb

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
"source": [
3939
"! pip3 install -r requirements.txt\n",
4040
"! zenml integration install sklearn xgboost -y\n",
41-
"! zenml connect --url https://1cf18d95-zenml.cloudinfra.zenml.io \n",
41+
"! zenml login https://1cf18d95-zenml.cloudinfra.zenml.io \n",
4242
"\n",
4343
"import IPython\n",
4444
"IPython.Application.instance().kernel.do_shutdown(restart=True)"
@@ -829,10 +829,17 @@
829829
" .ravel()\n",
830830
" .tolist(),\n",
831831
" }\n",
832-
" log_model_metadata(metadata={\"wandb_url\": wandb.run.url})\n",
833-
" log_artifact_metadata(\n",
832+
"\n",
833+
" try:\n",
834+
" if get_step_context().model:\n",
835+
" log_metadata(metadata=metadata, infer_model=True)\n",
836+
" except StepContextError:\n",
837+
" # If a model is not configured, it is not able to log metadata\n",
838+
" pass\n",
839+
"\n",
840+
" log_metadata(\n",
834841
" metadata=metadata,\n",
835-
" artifact_name=\"breast_cancer_classifier\",\n",
842+
" artifact_version_id=get_step_context().inputs[\"model\"].id,\n",
836843
" )\n",
837844
"\n",
838845
" wandb.log({\"train_accuracy\": metadata[\"train_accuracy\"]})\n",
@@ -1211,7 +1218,7 @@
12111218
"name": "python",
12121219
"nbconvert_exporter": "python",
12131220
"pygments_lexer": "ipython3",
1214-
"version": "3.9.18"
1221+
"version": "3.11.3"
12151222
}
12161223
},
12171224
"nbformat": 4,

classifier-e2e/steps/deploy_endpoint.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from utils.aws import get_aws_config
88
from utils.sagemaker_materializer import SagemakerPredictorMaterializer
99
from zenml import ArtifactConfig, get_step_context, log_artifact_metadata, step
10+
from zenml.enums import ArtifactType
1011

1112

1213
@step(
@@ -16,7 +17,10 @@
1617
def deploy_endpoint() -> (
1718
Annotated[
1819
Predictor,
19-
ArtifactConfig(name="sagemaker_endpoint", is_deployment_artifact=True),
20+
ArtifactConfig(
21+
name="sagemaker_endpoint",
22+
artifact_type=ArtifactType.SERVICE
23+
),
2024
]
2125
):
2226
role, session, region = get_aws_config()

classifier-e2e/steps/model_evaluator.py

Lines changed: 16 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,7 @@
2121
import wandb
2222
from sklearn.base import ClassifierMixin
2323
from sklearn.metrics import confusion_matrix
24-
from zenml import (
25-
get_step_context,
26-
log_artifact_metadata,
27-
log_model_metadata,
28-
step,
29-
)
24+
from zenml import step, log_metadata, get_step_context
3025
from zenml.client import Client
3126
from zenml.exceptions import StepContextError
3227
from zenml.logger import get_logger
@@ -60,12 +55,12 @@ def model_evaluator(
6055
step to force the pipeline run to fail early and all subsequent steps to
6156
be skipped.
6257
63-
This step is parameterized to configure the step independently of the step code,
64-
before running it in a pipeline. In this example, the step can be configured
65-
to use different values for the acceptable model performance thresholds and
66-
to control whether the pipeline run should fail if the model performance
67-
does not meet the minimum criteria. See the documentation for more
68-
information:
58+
This step is parameterized to configure the step independently of the step
59+
code, before running it in a pipeline. In this example, the step can be
60+
configured to use different values for the acceptable model performance
61+
thresholds and to control whether the pipeline run should fail if the model
62+
performance does not meet the minimum criteria. See the documentation for
63+
more information:
6964
7065
https://docs.zenml.io/user-guide/advanced-guide/configure-steps-pipelines
7166
@@ -89,17 +84,19 @@ def model_evaluator(
8984
dataset_tst.drop(columns=[target]),
9085
dataset_tst[target],
9186
)
92-
logger.info(f"Train accuracy={trn_acc*100:.2f}%")
93-
logger.info(f"Test accuracy={tst_acc*100:.2f}%")
87+
logger.info(f"Train accuracy={trn_acc * 100:.2f}%")
88+
logger.info(f"Test accuracy={tst_acc * 100:.2f}%")
9489

9590
messages = []
9691
if trn_acc < min_train_accuracy:
9792
messages.append(
98-
f"Train accuracy {trn_acc*100:.2f}% is below {min_train_accuracy*100:.2f}% !"
93+
f"Train accuracy {trn_acc * 100:.2f}% is below "
94+
f"{min_train_accuracy * 100:.2f}% !"
9995
)
10096
if tst_acc < min_test_accuracy:
10197
messages.append(
102-
f"Test accuracy {tst_acc*100:.2f}% is below {min_test_accuracy*100:.2f}% !"
98+
f"Test accuracy {tst_acc * 100:.2f}% is below "
99+
f"{min_test_accuracy * 100:.2f}% !"
103100
)
104101
else:
105102
for message in messages:
@@ -115,14 +112,14 @@ def model_evaluator(
115112
}
116113
try:
117114
if get_step_context().model:
118-
log_model_metadata(metadata={"wandb_url": wandb.run.url})
115+
log_metadata(metadata=metadata, infer_model=True)
119116
except StepContextError:
120117
# if model not configured not able to log metadata
121118
pass
122119

123-
log_artifact_metadata(
120+
log_metadata(
124121
metadata=metadata,
125-
artifact_name="breast_cancer_classifier",
122+
artifact_version_id=get_step_context().inputs["model"].id,
126123
)
127124

128125
wandb.log(

classifier-e2e/steps/model_trainer.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1414
# See the License for the specific language governing permissions and
1515
# limitations under the License.
16-
#
1716

1817
from typing import Optional
1918

@@ -23,6 +22,7 @@
2322
from typing_extensions import Annotated
2423
from utils.sagemaker_materializer import SagemakerMaterializer
2524
from zenml import ArtifactConfig, step
25+
from zenml.enums import ArtifactType
2626
from zenml.logger import get_logger
2727

2828
logger = get_logger(__name__)
@@ -39,7 +39,10 @@ def model_trainer(
3939
target: Optional[str] = "target",
4040
) -> Annotated[
4141
ClassifierMixin,
42-
ArtifactConfig(name="breast_cancer_classifier", is_model_artifact=True),
42+
ArtifactConfig(
43+
name="breast_cancer_classifier",
44+
artifact_type=ArtifactType.MODEL,
45+
),
4346
]:
4447
"""Configure and train a model on the training dataset.
4548

0 commit comments

Comments
 (0)