- Problem Statement
- Key Features
- Customer Churn Data Source
- Platform Processes
- Platform Infrastructure Diagram
- S3 File Drop Folder Structure
- Project Folders & Files
- Security Limitations & Future Improvements
- Installation Prerequisites
- Docker Local Image Storage Space Requirements
- Library Dependencies & Version Numbers
- How to Install Platform
- Platform ECS Services
- How to Upload Data
- How to Evaluate Data Drift & Model
- Data Drift & Prediction Score Email Alerts
- Unit Test Examples
- Integration Test Examples
- Pre-Commit Hooks
- Makefile Targets
- CI-CD Implementation
- DataTalksClub MLOps Zoomcamp Evaluation Criteria
- Companies rely on churn prediction models to proactively retain valuable customers--an effort that is typically more cost-effective than acquiring new ones.
- However, once deployed, these models risk losing accuracy over time as customer behavior and demographics shift.
- This project addresses this challenge by providing Data Scientists, Machine Learning Engineers, and their stakeholders a platform to continuously train, deploy, and monitor churn models, enabling organizations to detect drift, maintain model quality, and adapt to changing customer dynamics.
| Area | Features |
|---|---|
| 🧠 Model Development |
|
| 📊 Model Evaluation |
|
| 🚀 Model Deployment |
|
| 📈 Model Monitoring |
|
| 🔁 Model Maintenance |
|
| 🤝 Team Collaboration |
|
- The labeled customer churn data used to train the model was randomly collected from an Iranian telecom company on 4/8/2020 and made available for download by the UC Irvine Machine Learning Repository ↗.
- This repository contains a
data/folder with several CSV files prefixed withcustomer_churn_*. - The following files were split from the original dataset:
customer_churn_0.csv(used as training set)customer_churn_1.csvcustomer_churn_2.csv
- Additional testing was performed using
customer_churn_synthetic_*.csvfiles, which were generated from the original dataset using Gretel.ai ↗.
Two independent processes were enabled by this project, joined by the MLflow Model Registry:
The following process is implemented with two files inside the code/orchestration/modeling/ folder:
- Jupyter Notebook:
churn_model_training.ipynb- Main location for EDA, hyperparameter tuning, and calling upon the
modeling.churn_model_trainingmodule below
- Main location for EDA, hyperparameter tuning, and calling upon the
- Python Module:
churn_model_training.py- Contains functions used in both Jupyter Notebook and in Prefect Flow
churn_prediction_pipeline.py(see Model Inference, Reporting, and Evaluation section below)
- Contains functions used in both Jupyter Notebook and in Prefect Flow
flowchart TD
A[Download training data]
B[Prepare data]
C[Tune hyperparameters]
D[Narrow parameter search space using Optuna UI]
E[Train model]
F[Evaluate model on training set using MLflow UI]
G[Evaluate model on holdout set using MLflow UI]
H[Is model performance sufficient?]
I[Promote model in MLflow Registry]
A --> B --> C --> E --> F --> G --> H;
C --> D --> C;
H --> |Yes| I;
H --> |No| B;
- The following process is orchestrated by the Prefect Flow
code/orchestration/churn_prediction_pipeline.py. - A new flow run is created for each file dropped into the S3 File Drop Folder
inputfolder (see S3 File Drop Folder Structure).
flowchart TD
A[Drop new customer<br/>churn data into S3]
B[Load latest promoted model in MLflow Registry]
C[Validate file input]
D[Prepare data, reusing training logic]
E[Generate predictions]
F[Append predictions to file input]
G[Generate Evidently data drift and prediction performance report]
H[Save report to database]
I[Did drift exceed threshold?]
J[Send drift email alert]
K[Did prediction performance drop below threshold?]
L[Send prediction score email alert]
M[Evaluate detailed drift and performance report in Evidently UI]
N[Evaluate drift and performance over time in Grafana UI]
A-->B-->C-->D-->E-->F-->G-->H-->I-->K-->M-->N;
I--> |Yes| J;
K--> |Yes| L;
(click to enlarge)
s3://your_project_id └── data ├── input # Customer churn files uploaded here ├── processing # Files moved here during processing ├── logs # Log file created for each dropped file ├── processed # Files moved here on successful processing └── errored # Files moved here if error occurred during processing
This project consists mainly of the following folders and files:
| Folder/File | Purpose |
|---|---|
code/grafana/ |
|
code/orchestration/ |
|
code/s3_to_prefect_lambda/ |
|
data/ |
|
infrastructure/ |
|
readme-assets/ |
|
.env (generated) |
|
.pre-commit-config.yaml (generated) |
|
Makefile |
|
upload_simulation_script.py |
|
The full project folder tree contents can be viewed here.
| Security Limitation | Future Improvement |
|---|---|
| The IAM policy used is intentionally broad to reduce setup complexity. | Replace with least-privilege policies tailored to each service role. |
| Public subnets are required to simplify RDS access from ECS and local machines. | Migrate to private subnets with NAT Gateway and use bastion or VPN access for local clients. |
| The Prefect API ALB endpoint is publicly accessible to enable GitHub Actions deployment. | Restrict access to GitHub Actions IP ranges using ingress rules or CloudFront. |
| The MLflow ALB endpoint is publicly accessible to allow ECS Workers to reach the Model Registry. | Limit access to internal ECS security groups only. |
The Prefect API ALB endpoint is visible in cleartext as an environment variable in the .github/workflows/deploy-prefect.yml file. This may pose a security risk if your GitHub repo is publicly visible. |
Consider migrating this variable to a GitHub Repository secret and automatically upserting this value as a new Terraform action post-apply. |
- Python 3.10.x
- AWS Account ↗
- AWS Account required to deploy the pipeline to the cloud and run it as a user
- AWS Account NOT required to run unit and integration tests
- AWS User with the Required IAM Permissions policies
- AWS CLI ↗ installed with
aws configurerun to store AWS credentials locally - Docker ↗ installed and Docker Engine is running
- Pip ↗ and Pipenv ↗
- Terraform ↗
- Prefect ↗
- Pre-commit ↗
- GitHub Account
- At this time, committing repo to your GitHub account and running GitHub Actions workflow is the only way to deploy Prefect flow to Prefect Server (without manual effort to circumvent)
A user with the following AWS Managed Permissions policies was used when creating this Platform. Please note that this list is overly-permissive and may be updated in the future.
AmazonEC2ContainerRegistryFullAccessAmazonEC2FullAccessAmazonECS_FullAccessAmazonRDSFullAccessAmazonS3FullAccessAmazonSNSFullAccessAmazonLambda_FullAccessCloudWatchLogsFullAccessIAMFullAccess
The Docker images required for the following components occupy approximately 5.4 GB of local disk space:
- Custom Grafana Bundle
- Packages database configuration and dashboard files with Grafana Enterprise
- Uses Grafana
grafana/grafana-enterprise:12.0.2-security-01image
- S3-to-Prefect Lambda Function
- Invokes orchestration flow when new files are dropped into S3
- Uses AWS
public.ecr.aws/lambda/python:3.12image
- Testcontainers + LocalStack
- Used by integration tests to mock AWS S3 service using LocalStack
- Uses LocalStack
localstack/localstack:4.7.0image
After deployment, remove these local Docker images to conserve space.
See the Pipfile and Pipfile.lock files within the following folders for the full lists of library dependencies and version numbers used:
code/orchestration/code/s3_to_prefect_lambda/
- Install the prerequisites
- Ensure your Docker Engine is running
- Create an S3 bucket to store the state of your Terraform infrastructure (e.g.
churn-platform-tf-state-<some random number>) - Clone
churn-model-evaluation-platformrepository locally - Edit root Terraform configuration to store state within S3
- Edit file:
{REPO_DIR}/infrastructure/main.tf - Change
terraform.backend.s3.bucketto the name of the bucket you created - Change
terraform.backend.s3.regionto your AWS region
- Edit file:
- Copy Terraform
infrastructure/vars/stg.template.tfvarsfile to newinfrastructure/vars/stg.tfvarsfile and define values for each key within:
| Key Name | Purpose | Example Value |
|---|---|---|
project_id |
Used as prefix for many AWS resources, including the S3 bucket where files will be dropped and generated. Must be a valid S3 name (e.g. unique, no underscores). Must be 20 characters or less to prevent exceeding resource naming character limits. | mlops-churn-pipeline |
vpc_id |
Your AWS VPC ID | vpc-0a1b2c3d4e5f6g7h8 |
aws_region |
Your AWS Region | us-east-2 |
db_username |
Username for Postgres database used to store MLflow, Prefect, and Evidently Metrics. Must conform to Postgres rules (e.g. lowercase, numbers, underscores only) | my_super_secure_db_name |
db_password |
Password for Postgres database. Use best practices and avoid spaces. | Th1s1sAStr0ng#Pwd! |
grafana_admin_user |
Username for Grafana account used to edit data drift and model prediction scores over time. | grafana_FTW |
grafana_admin_password |
Password for Grafana account | Grafana4Lyfe!123 |
subnet_ids |
AWS Subnet IDs: Must be public subnet IDs from different Availability Zones to allow Postgres RDS instance to be accessed by ECS services (and optionally your IP address) | ["subnet-123abc456def78901", "subnet-234bcd567efg89012"] |
my_ip |
IP address that will be granted access to Grafana UI, Optuna UI and Postgres DB | 203.0.113.42 |
my_email_address |
Email address that will be notified if majority of inferenced data columns exhibit data drift or prediction scores fall below threshold | your.name@example.com |
cd {REPO_DIR}/infrastructurethenterraform init. If successful, this command will populate the Terraform State S3 bucket you created in Step 2 with the necessary files to capture the state of your infrastructure across Terraform command invocations.cd {REPO_DIR}/code/orchestrationthenpipenv shellcd {REPO_DIR}- Run
make planand review the infrastructure to be created (see Platform Infrastructure Diagram - Run
make applyto build Terraform infrastructure, set Prefect Secrets, update GitHub Actions workflow YAML, and start ECS services.- After Terraform completes instantiating each ECS Service, it will execute the
wait_for_services.shscript to poll the ALB URLs until each service instantiates its ECS Task and the service is ready for use. - For the user's convenience, each tool's URL is displayed to the user once ready for use (see Platform ECS Services).
- After Terraform completes instantiating each ECS Service, it will execute the
- Click each of the 5 ECS Service URLs to confirm they are running: MLflow, Optuna, Prefect Server, Evidently, Grafana
- Run
make deploy-modelto train aXGBoostChurnModelchurn model and upload it to the MLflow Model Registry withstagingalias.- Confirm it was created and aliased by visiting the Model Registry within the MLflow UI
- Note the following:
- Two versions of the model are visible in the registry evaluated using training and holdout datasets (
X_trainandX_test, respectively) - The data used to train the
stagingmodel was logged as the artifactreference_data.csvin its experiment run
- Two versions of the model are visible in the registry evaluated using training and holdout datasets (
- Deploy the
churn_prediction_pipelinePrefect Flow to your Prefect Server using GitHub Actions- Commit your cloned repo (including
{REPO_DIR}/.github/workflows/deploy-prefect.ymlupdated with generatedPREFECT_API_URL) - Log in to your GitHub account, navigate to your committed repo project and create the following Repository Secrets ↗ (used by
deploy-prefect.yml):AWS_ACCOUNT_IDAWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_REGION
- Navigate to GitHub Project Actions tab, select the workflow
Build and Deploy Prefect Flow to ECR, and click the green "Run workflow" button to deploy the Prefect flow- Confirm it was deployed sucessfully by visiting the "Deployments" section of the Prefect UI
- Commit your cloned repo (including
- Confirm your email subscription to the pipeline SNS topic
- Navigate to the inbox of the email address you configured in
stg.tfvarsand look for an email subject titledAWS Notification - Subscription Confirmation. - Open the email and click the
Confirm Subscriptionlink within. - Verify you see a green message relaying your subscription has been confirmed.
- Navigate to the inbox of the email address you configured in
Once the Terraform make apply command completes successfully, you should see output similar to the following that provides you URLs to each of the created tools:
🎉 All systems go! 🎉
MLflow, Optuna, Prefect, Evidently, and Grafana UI URLs
-------------------------------------------------------
🧪 MLflow UI: http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:5000
🔍 Optuna UI: http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:8080
⚙️ Prefect UI: http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:4200
📈 Evidently UI: http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:8000
📈 Grafana UI: http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:3000
Clicking on each URL should render each tool's UI successfully in your browser (the Terraform command includes invoking a script that polls the services' URLs until they return successful responses).
If any of the URLs return an error (e.g. 503 Service Unavailable), investigate the root cause by logging into the AWS Elastic Container Service (ECS) console and inspecting the logs of the ECS Task that is failing.
If all the services started successfully, your ECS Task list should look similar to this screenshot:
These URLs were also written to the {REPO_DIR}/.env file for future retrieval and export to shell environment when needed.
OPTUNA_DB_CONN_URL=postgresql+psycopg2://USERNAME:PASSWORD@your-project-id-postgres.abcdefghijk.us-east-2.rds.amazonaws.com:5432/optuna_db
MLFLOW_TRACKING_URI=http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:5000
PREFECT_API_URL=http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:4200/api
EVIDENTLY_UI_URL=http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:8000
PREFECT_UI_URL=http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:4200
GRAFANA_UI_URL=http://your-project-id-alb-123456789.us-east-2.elb.amazonaws.com:3000
The following sections give a brief overview of the tool features made available in this project:
- Lists model experiment runs that track model metrics and parameters used
- Captures details of each experiment run, including model type and training dataset used
- Automatically creates images to aid evaluation (e.g. confusion matrix, SHAP summary plot)
- Stores models in Model Registry for future use (e.g. loaded by Model Evaluation Pipeline on file drop)
Gain insight on Optuna hyperparameter tuning trials to narrow parameter search spaces and more quickly find optimal parameters.
View completed, running, and failed model evaluation runs to monitor pipeline health and address any unexpected issues.
Assess dataset drift and model performance for each new churn data drop to decide whether model retraining is needed.
Provides a pre-created dashboard plotting model data drift and performance metrics over time to distinguish anomalies from signals suggesting model development is needed.
-
Navigate to
{REPO_DIR}(and runcd code/orchestration && pipenv shellif you haven't already) -
You can process the labeled Customer Churn data in one of two ways:
- Run
make simulate-file-dropsfrom{REPO_DIR}to run the scriptupload_simulation_script.pywhich uploads each file in thedatafolder (exceptcustomer_churn_0.csv) to the S3 bucket folder - Manually upload files from the
{REPO_DIR}/datafolder into the S3 bucket{PROJECT_ID}/data/inputfolder
- Run
Once your files have completed processing (as visible via Prefect UI or seeing them appear in S3 data/processed/ folder), you can evaluate their data in two ways:
- Navigate to the Evidently UI to view detailed data drift metrics and prediction scores for each file
- Navigate to the Grafana UI and view the pre-built "Customer Churn Model Evaluation" dashboard to view how the drift metrics and prediction scores have behaved over time
The pipeline will send an email to the address configured within stg.tfvars in each of the following scenarios:
Sent if Evidently finds more than 50% of the new customer data set columns have drifted from the reference data set:

Sent if Evidently reports any of the observed prediction scores drop below 70%:
- F1 Score
- Precision
- Recall
- Accuracy
Example unit tests can be found within the code/orchestration/tests/unit/ folder for select Prefect @task functions of churn_prediction_pipeline.py.
The unittest.TestCase, unittest.mock.Patch, and unittest.mock.MagicMock classes were used to create reused test fixture code that overrode ("patch"-ed) class object references with mock objects.
├── code │ ├── orchestration │ │ ├── tests │ │ │ ├── unit │ │ │ │ ├── test_fetch_model.py │ │ │ │ ├── test_generate_predictions.py │ │ │ │ ├── test_move_to_folder.py │ │ │ │ ├── test_prepare_dataset.py │ │ │ │ └── test_validate_file_input.py | | └── churn_prediction_pipeline.py
Example integration tests can be found within the code/orchestration/tests/integration/ folder for the validate_file_input @task function of churn_prediction_pipeline.py.
In order to integration test the function is correctly reading files from S3, the testcontainers.localstack module ↗ was used to dynamically create a LocalStack ↗ container that served as a mock S3 endpoint for the s3_client calls made by the validate_file_input function.
├── code │ ├── orchestration │ │ ├── tests │ │ │ ├── integration │ │ │ │ └── test_validate_file_input.py | | └── churn_prediction_pipeline.py
The following steps are required to activate pre-commit hooks for this repository:
- Navigate to
{REPO_DIR}/code/orchestration/and runpipenv shellif you haven't already - Navigate to
{REPO_DIR}and runpre-commit install - Ensure your Docker Engine is running (needed for LocalStack-based integration tests)
- Run
make qualityto generate the required.pre-commit-config.yamlfile and execute the hooks
Generating .pre-commit-config.yaml was needed to inject the absolute path to the code/orchestration/modeling module folder for pylint (future improvement: use relative path instead). For this reason, .pre-commit-config.yaml is included in .gitignore to not commit cleartext absolute path in case you commit your repo publicly.
The following hooks are used to maintain notebook and module code quality and execute tests prior to commiting files to Git:
nbqa-pylintnbqa-flake8nbqa-blacknbqa-isorttrailing-whitespaceend-of-file-fixercheck-yamlcheck-added-large-filesisortblackpylintpytest-check
The following table lists the make targets available to accelerate platform deployment, development, and testing:
| Target Name | Purpose |
|---|---|
test |
Runs all unit and integration tests defined within code/orchestration and code/s3_to_prefect_lambda folders |
quality |
Runs pre-commit run --all-files. See Pre-Commit Hooks |
commit |
Stages all changed files, prompts user for commit message, and attempts to commit the files (barring pre-commit errors) |
plan |
Runs terraform plan --var-file=vars/stg.tfvars from infrastructure directory |
apply |
Runs terraform apply --var-file=vars/stg.tfvars --auto-approve and outputs emoji-filled message with UI URLs upon successful deploy and ECS Task activation |
destroy |
Runs terraform destroy -var-file=vars/stg.tfvars --auto-approve |
disable-lambda |
Used to facilitate local dev/testing: Disables notification of the s3_to_prefect Lambda function so files aren't automatically picked up by the deployed service. Lets you drop file(s) manually in S3 and run the pipeline locally when you're ready (see process-test-data target below). |
enable-lambda |
Re-enables the s3_to_prefect Lambda notification to resume creating new Prefect flow runs on S3 file drop |
deploy-model |
|
log-model-nopromote |
|
process-test-data |
Use to manually invoke flow after running disable-lambda target. Upload customer_churn_1.csv into the S3 data/input/ folder before use. Runs command python churn_prediction_pipeline.py your-project-id data/input/customer_churn_1.csv and instantiates ephemeral local Prefect Server to execute flow. |
simulate-file-drops |
Runs upload_simulation_script.py to automatically upload each non-training data file in the data/ folder to the S3 File Drop input folder. |
- GitHub Actions ↗ was used to execute the following Continuous Integration and Continuous Delivery (CI/CD) process.
- See
.github/workflows/deploy-prefect.ymlfor details.
flowchart TD
A[Commit changes to code/orchestration/ tree]
B[Manually run workflow from GitHub UI]
C[Initialize ubuntu-latest Runner VM]
D[Checkout code]
E[Set up Python]
F[Install Pipenv]
G[Install code/orchestration/ Pipfile dependencies]
H[Configure AWS credentials]
I[Run unit and integration tests]
J[Log in to Amazon ECR]
K[Construct Prefect Flow Docker image name & tag]
L[Inject Docker image name & tag into Prefect Flow YAML]
M[Display YAML in GitHub Actions log for verification]
N[Install Prefect]
O[Build Docker container & deploy to Prefect Server]
A-->C;
B-->C;
C-->D-->E-->F-->G-->H-->I-->J-->K-->L-->M-->N-->O;
This project earned the highest-tier score (top 10 of 183 participants ↗) in peer-reviewed project assessment.
Source: https://github.com/DataTalksClub/mlops-zoomcamp/tree/main/07-project
See Problem Statement section.
Target: The project is developed on the cloud and IaC tools are used for provisioning the infrastructure
- See Project Folders & Files section for summary of Terraform files used to create AWS infrastructure.
- See Platform Infrastructure Diagram for diagram of cloud resources created and collaborations for each.
See MLflow Tracking Server & Model Registry section for screenshots of experiments tracked and model stored in registry.
See Prefect Orchestration Server and Worker Service section for screenshots of fully deployed workflow within Prefect UI and examples of worflow executions ("runs").
Target: The model deployment code is containerized and could be deployed to cloud or special tools for model deployment are used
See the orchestration and s3_to_prefect_lambda folders of Project Folders & Files to see how the model deployment code was containerized and deployed to the cloud.
Target: Comprehensive model monitoring that sends alerts or runs a conditional workflow (e.g. retraining, generating debugging dashboard, switching to a different model) if the defined metrics threshold is violated
See Data Drift & Prediction Score Email Alerts section for examples of email alerts that are sent when new customer data files exhibit the majority of their columns drifting from reference data or when the model prediction scores drop below pre-defined threshold.
Target: Instructions are clear, it's easy to run the code, and it works. The versions for all the dependencies are specified.
- See How to Install Platform section for instructions on how to set up the platform.
- See Library Dependencies & Version Numbers section for instructions on how to determine libraries used and their verision numbers.
See Unit Test Examples section for summary of unit tests that were implemented.
See Integration Test Examples section for summary of integration tests that were implemented.
See Hooks List section to see which linter and code formatters were used.
See Makefile Targets section for list of Makefile targets that were implemented.
See Pre-Commit Hooks section to see which hooks were used.
See CI-CD Implementation section for summary of how CI/CD was implemented.




























