Decision-Aware Customer Churn Platform

An end-to-end Data Engineering + Data Science project

Overview

This project demonstrates how to build a decision-oriented data platform using a real-world open dataset.
Rather than focusing solely on predictive modeling, the project emphasizes the full lifecycle from raw data ingestion to actionable business decisions under resource constraints.

The use case is customer churn prevention in a subscription-based telecom business.

Business Problem

Customer churn directly impacts recurring revenue.
While predictive models can estimate churn probability, business value is only realized when predictions are translated into decisions.

Constraints:

Retention actions (e.g. offers, calls) have limited budget
Not all high-risk customers are worth intervening
Data must be reliable, reproducible, and explainable

Goal
Design a system that:

Produces stable, well-defined datasets
Trains a churn prediction model
Converts predictions into a resource-constrained intervention policy
Evaluates expected business impact offline

Dataset

Telco Customer Churn Dataset (open-source)
Source: Kaggle / OpenML
Granularity: one row per customer (snapshot)

Project Structure

Telco-Customer-Churn/
├── data/
│   ├── raw/            # Original dataset (immutable)
│   ├── staging/        # Cleaned, typed data
│   ├── features/       # Business features
│   └── artifacts/      # Models, scores, decisions
│
├── src/
│   ├── ingestion/      # Raw → staging
│   ├── features/       # Feature construction
│   ├── datasets/       # Train / inference datasets
│   ├── models/         # Prediction models
│   ├── decision/       # Decision policies
│   └── evaluation/     # Offline evaluation
│
├── docs/
│   ├── data_contracts.md
│   ├── feature_definitions.md
│   ├── decision_policy.md
│   └── evaluation.md
│
├── pipelines/
│   └── run_all.sh
│
└── README.md

Methodology

Data Engineering
- Schema definition and validation
- Separation of raw, staging, and feature layers
- Reproducible dataset construction
Data Science
- Baseline churn prediction model
- Transparent evaluation (ROC-AUC, Precision@K)
Decision Science
- Budget-constrained intervention policy
- Ranking-based decision making
- Offline value estimation

Key Concepts Demonstrated

Prediction vs Decision
Data contracts and schema stability
Feature engineering with business meaning
Resource-constrained decision policies
Offline evaluation of strategies

Disclaimer

This project is designed for learning and demonstration purposes.
The dataset is a static snapshot and does not include real intervention outcomes; therefore, causal impact is approximated rather than identified.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
docs		docs
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decision-Aware Customer Churn Platform

Overview

Business Problem

Dataset

Project Structure

Methodology

Key Concepts Demonstrated

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

SakigamiYang/telco-customer-churn

Folders and files

Latest commit

History

Repository files navigation

Decision-Aware Customer Churn Platform

Overview

Business Problem

Dataset

Project Structure

Methodology

Key Concepts Demonstrated

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages