Skip to content

Capstone Project for DataTalksClub MLOps Zoomcamp 2025 Course

License

Notifications You must be signed in to change notification settings

Dcwind/customer-churn-mlops

Repository files navigation

Customer Churn Prediction: MLOps Capstone


Introduction

This repository contains the implementation of an end-to-end MLOps pipeline for predicting customer churn in a telecom company. The project demonstrates core MLOps principles, including experiment tracking, orchestration, model deployment, monitoring, and best practices for reproducibility and scalability. It serves as the final capstone for the MLOps course, showcasing a cloud-ready, extensible, and reproducible machine learning service.


Table of Contents


Problem Statement

Telecom providers face significant revenue loss due to customer churn. By predicting which customers are likely to leave one month in advance, companies can offer targeted retention incentives to reduce churn. This project builds a binary classifier to flag high-risk customers, enabling proactive retention strategies.


System Architecture

System Architecture


Data

The dataset is sourced from the IBM Telco Customer Churn dataset, publicly available as a CSV file.

  • Source: IBM Telco Customer Churn CSV
  • Size: ~1 MB, ~7043 rows, 21 features
  • Target Variable: Churn (Outcome: Yes/No)
  • Features: Customer demographics, account information, and service usage details

The dataset is small, enabling fast iteration during development, while still being rich enough to support a complete demonstration of the MLOps workflow.


Experiment Tracking

Experiment tracking is managed using MLflow with a local server backed by S3 for artifact storage. All model training runs, hyperparameters, and metrics are logged to MLflow. The best-performing model is registered in the MLflow model registry as the "Production" model for deployment.

  • Tool: MLflow
  • Metrics Tracked: PR-AUC (primary), Accuracy, F1 (secondary)
  • Artifacts: Trained models, preprocessing pipelines

Orchestration

Workflow orchestration is handled by Prefect 2, which manages the training pipeline. Prefect flows are Python-native, with built-in retries and scheduling capabilities.

  • Pipelines:
    • mvp_training_flow: Runs the src/train.py script as a subprocess within the project's virtual environment using pipenv.
  • Tool: Prefect 2
  • Storage: AWS S3 for input data and logs

Deployment

The model is deployed as a FastAPI inference service, containerized using Docker for portability. The service loads the Production model from the MLflow registry and exposes an API endpoint for real-time predictions.

  • Tools: FastAPI, Uvicorn, Docker
  • Model: Scikit-learn pipeline (preprocessing + LogisticRegression)
  • Artifact Source: MLflow model registry
  • Deployment: Local or cloud-ready (e.g., ECS Fargate as a stretch goal)

Monitoring

(This feature is not yet functional.) Model and data monitoring are to be implemented using Evidently, which generates HTML reports and JSON flags for drift and performance issues. If drift is detected, a retrain flow would be triggered via Prefect.

  • Tool: Evidently
  • Metrics Monitored: Data drift, model performance (PR-AUC, Accuracy, F1)
  • Output: HTML report, JSON flag for retraining
  • Alerting: Slack alerts as a stretch goal

Best Practices

The project adheres to MLOps best practices to ensure reproducibility, maintainability, and quality:

  • Unit Tests: Pytest for testing data preprocessing, model training, and API endpoints.
  • Integration Tests: REST API-based prediction endpoint testing by testing adding sample customer
  • Linting & Formatting: Pre-commit hooks with isort, black, and pylint.
  • CI/CD: GitHub Actions for linting, testing, and Docker image builds.
  • Automation: Makefile for managing dependencies, builds, and runs.
  • Reproducibility: Pinned dependencies in requirements.txt, clear README with setup instructions.

Setup

Prerequisites

  • Python 3.12
  • pipenv
  • Docker and Docker Compose
  • A Unix-based system (MacOS/Linux) – Terraform v1.12.2+
  • AWS CLI (for Terraform and S3 configuration) – An AWS account and an administrative IAM user

πŸš€ Setup Instructions

Follow these steps to set up the project environment and run the MLOps pipeline on your local machine.

1. Initial Setup (One-Time Action)

Follow these steps the first time you clone the repository.

1.1. Clone the Repository

git clone https://github.com/Dcwind/customer-churn-mlops.git
cd customer-churn-mlops

1.2. Create and Activate a Virtual Environment

It is crucial to work inside a virtual environment to manage project dependencies and avoid conflicts with your global Python installation. You have two options:

Option A: Using pipenv (Recommended)

This is the simplest method as it handles environment creation and package installation in one step.

# This command creates a virtual environment and installs all dependencies
pipenv install --dev

# Activate the virtual environment shell
pipenv shell

Option B: Using venv (Standard Python)

If you prefer not to use pipenv, you can use Python's built-in venv module.

# Create a virtual environment named 'venv'
python3 -m venv venv

# Activate the environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
# venv\Scripts\activate

# Install all necessary dependencies for development
make install

Note: All subsequent commands in this setup guide should be run from within the activated virtual environment.

1.3. Set Up Pre-commit Hooks

This project uses pre-commit to automatically run code quality checks before each commit. This only needs to be set up once per project clone.

# Install the git hooks
pre-commit install
pre-commit autoupdate

1.4. Configure AWS Admin Credentials

Configure the AWS CLI with your main administrative IAM user credentials. This is required to run Terraform.

aws configure

Verify your identity with:

aws sts get-caller-identity

1.5. Provision Cloud Infrastructure

Navigate to the terraform directory and run the following commands to create the S3 bucket and a dedicated, low-privilege IAM user for the application.

cd terraform
terraform init
terraform apply

When prompted, type yes to approve. After the command completes, use terraform output to get the access keys for the newly created mlflow-s3-user.

1.6. Configure Application AWS Profile

Configure a new AWS CLI profile for the application using the credentials you just generated from Terraform.

# Go back to the project root
cd ..

# Configure the new profile
aws configure --profile mlflow-app

1.7. Initialize Project Directories

This command creates the necessary mlruns directory with the correct user permissions.

make init

2. Development Workflow

This is the standard day-to-day workflow for running the application. You will need at least two separate terminals, both running inside the activated pipenv shell.

2.1. Start the MLflow Server (Terminal 1)

In your first terminal, start the MLflow server. It will now use your S3 bucket for artifact storage.

make mlflow-ui

You can now access the MLflow dashboard in your browser at http://localhost:5000.

2.2. Run an Initial Training Job (Terminal 2)

Before starting the prediction service, you must train and register at least one model. With the MLflow server running, execute the training script.

make train

Alternatively, to run the orchestrated training pipeline:

make orchestrate

2.3. Run the Prediction Service (Terminal 2 or 3)

Once a model has been registered, you can build and run the containerized prediction service.

# Build the Docker image
make docker-build

# Run the container
make docker-run

You can now access the prediction service's health check at http://localhost:8000/health and its interactive API documentation at http://localhost:8000/docs.

2.4. Run Subsequent Training Jobs (Optional)

You can now run make train or make orchestrate at any time to register new model versions. The prediction service will automatically load the latest version the next time it restarts.

3. Stopping the Application

To stop the services, press Ctrl+C in the respective terminals.


πŸ§ͺ Running Tests

This project includes a suite of tests to ensure code quality and correctness. The tests are written using pytest.

Test Types

  • Unit Tests: Located in tests/, these tests check small, isolated pieces of code, such as individual functions. They are fast and do not require any external services to be running.
  • Integration Tests: Also in tests/, these tests verify that different components of the system work together correctly. For this project, the integration test checks the live, containerized prediction service.

How to Run the Tests

All tests can be run using a single command from the Makefile. Make sure you are inside the activated pipenv shell before running the commands.

Running All Tests

To run the entire test suite (both unit and integration tests), you must have the application services running first.

Step 1: Start the Services (in separate terminals)

# In Terminal 1
make mlflow-ui

# In Terminal 2
make serve-docker

Step 2: Run the Test Suite (in Terminal 3)

make test

pytest will automatically discover and run all test files in the tests/ directory.

Running Only Unit Tests

If you want to quickly run only the unit tests without starting the full application stack, you can run pytest and tell it to ignore the integration test file.

# This command runs all tests EXCEPT the integration test
pipenv run python -m pytest --ignore=tests/test_prediction_service.py

Project Structure

customer-churn-mlops/
β”‚
β”œβ”€β”€ src/                       # Source code modules
|   β”œβ”€β”€ data_processing.py     # Data loader and preprocessor
β”‚   └── train.py               # Model training pipeline in MLFlow
β”‚
β”œβ”€β”€ orchestration/             # Workflow orchestration
|   β”œβ”€β”€ generate_report.py     # Generate a data-drift report using Evidently
β”‚   └── flows.py               # Automated training pipeline using Prefect
β”‚
β”œβ”€β”€ deployment/                # Model deployment
|   β”œβ”€β”€ Dockerfile             # Setup dependencies, starts Uvicorn server
β”‚   └── app.py                 # API endpoints
β”‚
β”œβ”€β”€ terraform/                 # Terraform configurations
|   β”œβ”€β”€ main.tf                # AWS S3 and IAM configuration
|   └── variables.tf           # AWS variables
|
β”œβ”€β”€ tests/                           # Test suite
|   β”œβ”€β”€ test_data_processing.py      # Data processor testing
β”‚   └── test_prediction_service.py   # Prediction service testing
β”‚
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ requirements-dev.txt       # Python dependencies for dev mode
β”œβ”€β”€ pyproject.toml             # Project configuration
β”œβ”€β”€ Makefile                   # Automation commands
β”œβ”€β”€ docker-compose.yml         # Multi-container orchestration
β”œβ”€β”€ .pre-commit-config.yaml    # Code quality hooks
β”œβ”€β”€ .github/                   # GitHub Actions workflows
β”‚   └── workflows/
β”œβ”€β”€ Pipfile                    # Pip package description
└── README.md                  # Project documentation

Sample Run

Below are screenshots showcasing key components of the MLOps pipeline for the Customer Churn Prediction project.

Pre-commit Runs

Pre-commit Run Pre-commit Run

First Run

First run by calling train.py:

First Run

Metric Snapshot:

Metric Value Interpretation
PR-AUC 0.6365 Very strong, you're catching churners well while keeping false positives low.
Accuracy 0.7977 Good, but not very informative on imbalanced data (most customers don’t churn).
F1 Score 0.5568 Solid balance between precision and recall. Shows your model isn't just guessing.

When running make orchestrate, it also runs a training:

Orchestration

MLflow UI

Results of the first try are shown in the MLFlow UI:

MLflow UI

After a few more runs, the training results are shown as such: Multiple Runs

Miscellaneous Screenshots

FastAPI Documentation

FastAPI Documentation

AWS Credentials for Terraform

This project uses a secure "bootstrap" pattern for provisioning infrastructure with Terraform.

  1. An initial, high-privilege IAM user (with AdministratorAccess) is used only to run the initial terraform apply.

  2. Terraform then provisions all necessary resources, including:

    • An S3 bucket for MLflow artifacts.
    • A new, dedicated IAM user with a least-privilege policy, granting it access only to that specific S3 bucket.
  3. The application (MLflow) is then configured with the credentials of this new, limited user.

This approach ensures that the application operates with the minimum permissions required, decoupling it from the powerful credentials needed to manage the infrastructure and significantly enhancing the security posture of the project.

AWS Access

Evidently Project List

Evidently Project List

Pytest Run

Pytest Run

Github Actions

Github Actions Running

Github Actions Ran


Future Works

  • Developing a functional monitoring component as described.
  • Deploy the FastAPI service to ECS Fargate for scalability.
  • Set up Slack alerts for drift detection.

About

Capstone Project for DataTalksClub MLOps Zoomcamp 2025 Course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published