upGrad - Telecom Churn Prediction Project

Overview

This project aims to predict customer churn in a telecom company using various machine learning models. The notebook explores several data preprocessing steps, feature selection techniques, and model evaluation approaches to identify customers who are likely to leave the service.

Dataset

The dataset used for this project includes historical data of telecom customers, such as call minutes, recharge amounts, and usage statistics over several months. The target variable is churn_probability, indicating whether a customer is likely to churn (1) or not (0).

Steps in the Notebook

Data Loading and Cleaning:
- Loaded the dataset and performed initial exploratory data analysis (EDA).
- Cleaned data by handling missing values and removing redundant or less informative features.
Feature Engineering:
- Created additional features based on existing columns to derive more insights.
- Employed quantile-based thresholds to identify high-value customers.
Feature Selection:
- Used Recursive Feature Elimination (RFE) and Principal Component Analysis (PCA) to select important features.
- Top 20 features selected by RFE include 'total_ic_mou_8', 'total_rech_amt_8', 'vol_2g_mb_8', etc.
Data Splitting:
- Split the data into training and testing sets, ensuring the same proportion of churn and non-churn customers in both sets.
Model Building:
- Tested multiple machine learning models, including Logistic Regression, Decision Trees, RandomForest, and XGBoost, with a combination of RFE and PCA for feature reduction.
- Hyperparameter tuning was performed using GridSearchCV to optimize each model's performance.
Evaluation:
- Evaluated model performance using metrics like accuracy, precision, recall, F1-score, and ROC-AUC.
- Plotted ROC curves for each model to visualize their ability to distinguish between churn and non-churn customers.
Results:
- Compared different models, and discussed their strengths and weaknesses in predicting churn cases, especially focusing on the minority churn class.

Key Insights

Feature Selection: Feature engineering and RFE helped narrow down to the most impactful features, improving model interpretability.
Model Performance: RandomForest performed the best with high accuracy, recall, and AUC, effectively distinguishing between churn and non-churn customers.
Challenges: The main challenge was dealing with class imbalance, which led to difficulties in predicting churn with high precision. The models often struggled with false positives for the churn class.

How to Run the Notebook

Clone the repository and navigate to the project directory.
Install the necessary dependencies using the command:
```
pip install -r requirements.txt
```

Open the Jupyter Notebook:

jupyter notebook "Amit Mohite - Telecom Churn Prediction.ipynb"

Run the cells sequentially to replicate the results and observe the model performance.

Dependencies

Python 3.7+
Jupyter Notebook
pandas, numpy, scikit-learn, xgboost, matplotlib

Future Work

Improve Churn Precision: Focus on improving the precision for the churn class to reduce false positives.
Model Optimization: Experiment with ensemble techniques or neural networks for better performance.
Customer Segmentation: Explore customer segmentation to understand different customer groups better and target retention strategies more effectively.

Author

Amit Mohite

For any queries, feel free to contact me via GitHub or email ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
.gitignore		.gitignore
Amit Mohite - Telecom Churn Prediction.ipynb		Amit Mohite - Telecom Churn Prediction.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

upGrad - Telecom Churn Prediction Project

Overview

Dataset

Steps in the Notebook

Key Insights

How to Run the Notebook

Dependencies

Future Work

Author

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mohiteamit/telecom-churn-casestudy

Folders and files

Latest commit

History

Repository files navigation

upGrad - Telecom Churn Prediction Project

Overview

Dataset

Steps in the Notebook

Key Insights

How to Run the Notebook

Dependencies

Future Work

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages