Learn how to deploy a trained scikit-learn model as a production-ready HTTP service with warm model loading and an interactive web UI.
- Create training and inference pipelines for customer churn prediction
- Deploy a pipeline as a warm, long-running HTTP service with sub-100ms latency
- Load models once at startup to eliminate cold-start delays
- Serve an interactive web interface alongside your REST API
- Track every prediction with full ZenML lineage and artifacts
pip install -r requirements.txt
zenml init
zenml loginFirst train the model (see code). This will train a scikit-learn based churn prediction model on sample data, and tag the resulting artifact as production.
python run.py --trainDeploy the tagged model as a real-time FastAPI service (see code):
# See deployment with custom frontend at http://localhost:8000
zenml pipeline deploy pipelines.inference_pipeline.churn_inference_pipelineYou can also run batch inference:
python run.py --predict # Run inference on sample customer data
python run.py --predict --features '{"account_length": 50, ...}' # Custom predictionVisit http://localhost:8000 for the interactive UI (see code).
Make predictions via API
curl -X POST http://localhost:8000/invoke \
-H "Content-Type: application/json" \
-d '{
"parameters": {
"customer_features": {
"account_length": 24,
"customer_service_calls": 2,
"monthly_charges": 45.0,
"total_charges": 1080.0,
"has_internet_service": 1,
"has_phone_service": 1,
"contract_length": 12,
"payment_method_electronic": 0
}
}
}'Use the ZenML Deployment Playground
The ZenML dashboard includes a built-in playground for deployed pipelines, allowing you to test your service directly from the UI without writing any code. Simply navigate to your deployment in the dashboard, fill in the input parameters interactively, and send requests to see real-time predictions. This makes it easy to validate your deployment, debug issues, and share working examples with your team—all without leaving the browser or crafting curl commands.
View API documentation
Visit http://localhost:8000/docs for interactive Swagger documentation.
deploying_ml_model/
├── pipelines/
│ ├── training_pipeline.py - Generate synthetic data and train model
│ ├── inference_pipeline.py - Real-time prediction service
│ └── hooks.py - Warm model loading at startup/shutdown
├── steps/
│ ├── data.py - Customer data generation
│ ├── train.py - Model training and evaluation
│ └── inference.py - Fast prediction step
├── ui/
│ └── index.html - Interactive web form
├── run.py - CLI for training and testing
└── requirements.txt - Dependencies
How it works: The training pipeline generates synthetic customer data and trains a Random Forest classifier. The inference pipeline loads this model once at deployment startup (via on_init hook) and uses it for fast per-request predictions. The web UI connects directly to the deployed service for real-time predictions.
ZenML allows you to deploy pipelines in a real-time API Just decorate your pipeline with DeploymentSettings:
from zenml.pipelines import pipeline
from zenml.config import DeploymentSettings
@pipeline(
settings={
"deployment": DeploymentSettings(
app_title="Churn Prediction API",
dashboard_files_path="ui",
cors={"allow_origins": ["*"]},
),
}
)
def churn_inference_pipeline(customer_features: Dict) -> Dict:
return predict_churn(customer_features)You can configure various different deployers in your active stack to push the deployed pipeline onto your infrastructure (e.g. AWS App Runner Deployer, GCP Cloud Run Deployer, Docker Deployer, Local Deployer))
The on_init hook runs once when your pipeline deploys, loading the model into memory. This stays warm for all requests, eliminating the 8-15 second cold start typical of serverless ML solutions:
@pipeline(
on_init=init_model, # Runs once at startup
on_cleanup=cleanup_model, # Clean shutdown
)
def churn_inference_pipeline(customer_features: Dict) -> Dict:
return predict_churn(customer_features=customer_features)Use DeploymentSettings to configure your HTTP service, including authentication, CORS, and static file serving:
settings={
"deployment": DeploymentSettings(
app_title="Churn Prediction API",
dashboard_files_path="ui", # Serve web UI at root
cors=CORSConfig(allow_origins=["*"]),
),
}
