Skip to content

Commit 49063ad

Browse files
committed
Add RUN.md for training and deployment instructions; update train_config.yaml for Kubernetes settings
1 parent bb6b04b commit 49063ad

File tree

2 files changed

+100
-0
lines changed

2 files changed

+100
-0
lines changed

train_and_deploy/RUN.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# Train and Deploy ML Project
2+
3+
This README provides step-by-step instructions for running the training and deployment pipeline using ZenML.
4+
5+
## Prerequisites
6+
7+
- Git installed
8+
- Python environment set up
9+
- ZenML installed
10+
- Access to the ZenML project repository
11+
12+
## Project Setup
13+
14+
1. Clone the repository and checkout the feature branch:
15+
```bash
16+
git clone [email protected]:zenml-io/zenml-projects.git
17+
git checkout feature/update-train-deploy
18+
```
19+
20+
2. Navigate to the project directory:
21+
```bash
22+
cd train_and_deploy
23+
```
24+
25+
3. Initialize ZenML in the project:
26+
```bash
27+
zenml init
28+
```
29+
30+
## Running the Pipeline
31+
32+
### Training
33+
34+
You have two options for running the training pipeline:
35+
36+
#### Option 1: Automatic via CI
37+
Make any change to the code and push it. This will automatically trigger the CI pipeline that launches training in SkyPilot.
38+
39+
#### Option 2: Manual Execution
40+
1. First, set up your stack. You can choose between:
41+
- Local stack (uses local orchestrator):
42+
```bash
43+
zenml stack set LocalGitGuardian
44+
```
45+
- Remote stack (uses SkyPilot orchestrator):
46+
```bash
47+
zenml stack set RemoteGitGuardian
48+
```
49+
50+
2. Run the training pipeline:
51+
```bash
52+
python run --training
53+
```
54+
55+
### Model Deployment
56+
57+
1. After training completes, deploy the model:
58+
```bash
59+
python run --deployment
60+
```
61+
62+
Note: At this stage, the deployment is done to the model set as "staging" (configured in `target_env`), and the model is deployed locally using BentoML.
63+
64+
2. Test the deployed model:
65+
```bash
66+
python run --inference
67+
```
68+
69+
### Production Deployment
70+
71+
If the staging model performs well and you want to proceed with production deployment:
72+
73+
1. Deploy to Kubernetes:
74+
```bash
75+
python run --production
76+
```
77+
This pipeline will:
78+
- Build a Docker image from the BentoML service
79+
- Deploy it to Kubernetes
80+
81+
## Additional Resources
82+
83+
- [ZenML Projects Tenant Dashboard](https://cloud.zenml.io/organizations/fc992c14-d960-4db7-812e-8f070c99c6f0/tenants/12ec0fd2-ed02-4479-8ff9-ecbfbaae3285)
84+
- [Example GitHub Actions Pipeline](https://github.com/zenml-io/zenml-projects/actions/runs/12075854945/job/33676323427)
85+
86+
## Pipeline Flow Overview
87+
88+
1. Training → Creates and trains the model
89+
2. Deployment → Deploys model to staging environment (local BentoML)
90+
3. Inference → Tests the deployed model
91+
4. Production → Deploys to production Kubernetes environment
92+
93+
## Notes
94+
95+
- The deployment configurations are controlled by the `target_env` setting in the configs
96+
- Make sure you have the necessary permissions and access rights before running the pipelines
97+
- Monitor the CI/CD pipeline in GitHub Actions when using automatic deployment

train_and_deploy/configs/train_config.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ settings:
2323
- sklearn
2424
- slack
2525
- bentoml
26+
orchestrator.vm_kubernetes:
27+
down: True
28+
idle_minutes_to_autostop: 2
2629

2730
# configuration of steps
2831
steps:

0 commit comments

Comments
 (0)