Skip to content

Commit 72c3fb3

Browse files
authored
Merge pull request #176 from zenml-io/productize-projects
Update projects to make them look like products
2 parents f14eb80 + 3f981c2 commit 72c3fb3

File tree

178 files changed

+863
-671
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

178 files changed

+863
-671
lines changed

airflow-cloud-composer-etl-feature-train/README.md

Lines changed: 0 additions & 171 deletions
This file was deleted.
File renamed without changes.
File renamed without changes.

eurorate-predictor/README.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# EuroRate Predictor
2+
3+
Turn European Central Bank data into actionable interest rate forecasts with this comprehensive MLOps solution.
4+
5+
## 🚀 Product Overview
6+
7+
EuroRate Predictor is a production-ready MLOps solution that transforms raw European Central Bank (ECB) interest rate data into accurate forecasts to inform your financial decision-making. Built on ZenML's robust framework, it delivers enterprise-grade machine learning pipelines that can be deployed in both development and production environments.
8+
9+
![EuroRate Predictor Pipeline Architecture](.assets/zenml_airflow_vertex_gcp_mlops.png)
10+
11+
### Key Features
12+
13+
- **End-to-End MLOps Pipeline**: From data extraction to model deployment
14+
- **Cloud-Ready Architecture**: Seamlessly runs on Google Cloud Platform
15+
- **Flexible Deployment Options**: Development mode for quick iteration, Production mode for enterprise deployment
16+
- **Automated Model Evaluation**: Ensures only high-quality models are promoted to production
17+
- **Scalable Infrastructure**: Leverages Airflow and Vertex AI for enterprise-grade performance
18+
19+
## 💡 How It Works
20+
21+
EuroRate Predictor consists of three integrated pipelines:
22+
23+
1. **Data Processing Pipeline** (Powered by Airflow)
24+
- Extracts raw ECB interest rate data from authoritative sources
25+
- Performs robust data cleaning and transformation
26+
- Produces standardized datasets ready for feature engineering
27+
28+
2. **Feature Engineering Pipeline** (Powered by Airflow)
29+
- Enriches datasets with financial domain-specific features
30+
- Implements time-series specific transformations
31+
- Creates feature-rich datasets optimized for predictive modeling
32+
33+
3. **Predictive Modeling Pipeline** (Hybrid Airflow/Vertex AI)
34+
- Trains advanced XGBoost regression models on Google's Vertex AI
35+
- Implements rigorous model evaluation protocols
36+
- Automatically promotes high-performing models to production
37+
38+
## 🔧 Getting Started
39+
40+
EuroRate Predictor offers two operational modes:
41+
42+
- **Development Mode**: Perfect for data scientists to iterate quickly on local machines
43+
- **Production Mode**: Enterprise-ready deployment using GCP's Airflow/Vertex AI infrastructure
44+
45+
### Prerequisites
46+
47+
- Python 3.8+
48+
- Google Cloud Platform account (for production deployment)
49+
- ZenML installed and configured
50+
51+
### Installation
52+
53+
1. Set up your environment:
54+
55+
```bash
56+
# Create and activate a Python virtual environment
57+
python3 -m venv .venv
58+
source .venv/bin/activate
59+
60+
# Install EuroRate Predictor and dependencies
61+
pip install -r requirements.txt
62+
63+
# Install required integrations
64+
zenml integration install gcp airflow
65+
```
66+
67+
### Configuration
68+
69+
#### Development Mode
70+
For quick iteration and testing, the default configuration works out-of-the-box with the included sample dataset.
71+
72+
#### Production Mode
73+
For enterprise deployment, configure your cloud infrastructure:
74+
75+
1. **Set up your GCP Stack** using the ZenML [GCP Stack Terraform module](https://registry.terraform.io/modules/zenml-io/zenml-stack/gcp/latest):
76+
77+
```hcl
78+
module "zenml_stack" {
79+
source = "zenml-io/zenml-stack/gcp"
80+
81+
project_id = "your-gcp-project-id"
82+
region = "europe-west1"
83+
orchestrator = "vertex" # or "skypilot" or "airflow"
84+
zenml_server_url = "https://your-zenml-server-url.com"
85+
zenml_api_key = "ZENKEY_1234567890..."
86+
}
87+
output "zenml_stack_id" {
88+
value = module.zenml_stack.zenml_stack_id
89+
}
90+
output "zenml_stack_name" {
91+
value = module.zenml_stack.zenml_stack_name
92+
}
93+
```
94+
To learn more about the terraform script, read the
95+
[ZenML documentation.](https://docs.zenml.io/how-to/
96+
stack-deployment/deploy-a-cloud-stack-with-terraform) or
97+
see
98+
the [Terraform registry](https://registry.terraform.io/
99+
modules/zenml-io/zenml-stack).
100+
101+
2. **Configure your data sources and destinations**:
102+
103+
- Update the `data_path` and `table_id` in [`configs/etl_production.yaml`](configs/etl_production.yaml)
104+
- Set the output `table_id` in [`configs/feature_engineering_production.yaml`](configs/feature_engineering_production.yaml)
105+
106+
### Running EuroRate Predictor
107+
108+
Execute the pipelines in sequence to generate your interest rate forecasts:
109+
110+
```bash
111+
# Run the ETL pipeline
112+
python run.py --etl
113+
114+
# Run the ETL pipeline in production, i.e., using the right keys
115+
python run.py --etl --mode production
116+
117+
# Run the feature engineering pipeline with the latest transformed dataset version
118+
python run.py --feature --mode production
119+
120+
# Run the model training pipeline with the latest augmented dataset version
121+
python run.py --training --mode production
122+
123+
# Use specific dataset versions (for reproducibility)
124+
python run.py --feature --transformed_dataset_version "200"
125+
126+
# Run the model training pipeline with a specific augmented dataset version
127+
python run.py --training --augmented_dataset_version "120"
128+
```
129+
130+
After execution, access detailed visualizations and metrics in the ZenML dashboard.
131+
132+
## 📊 Results and Visualization
133+
134+
EuroRate Predictor provides comprehensive visualizations of:
135+
- Data quality metrics
136+
- Feature importance analysis
137+
- Model performance evaluations
138+
- Prediction accuracy over time
139+
140+
Access these insights through the ZenML UI by following the link displayed after pipeline execution.
141+
142+
## 📁 Product Structure
143+
144+
EuroRate Predictor follows a modular architecture:
145+
146+
```
147+
├── configs # Pipeline configuration profiles
148+
├── data # Sample and processed datasets
149+
├── materializers # Custom data handlers
150+
├── pipelines # Core pipeline definitions
151+
├── steps # Individual pipeline components
152+
│ ├── extract_data_local.py
153+
│ ├── extract_data_remote.py
154+
│ └── transform.py
155+
├── feature_engineering # Feature creation components
156+
├── training # Model training components
157+
└── run.py # Command-line interface
158+
```
159+
160+
## 📚 Documentation
161+
162+
For detailed documentation on using ZenML to build your own MLOps pipelines, please refer to our [ZenML documentation](https://docs.zenml.io/).
163+
164+
## 🔄 Continuous Improvement
165+
166+
EuroRate Predictor is designed for continuous improvement of your interest rate forecasts. As new ECB data becomes available, simply re-run the pipelines to generate updated predictions.

0 commit comments

Comments
 (0)