- This project is the first project in the data science methodologies course. It will go through the data science pipeline and attempt to discover why customers are churning at Telco. This project is important for two reasons:
- For experience with a project that cycles through a full data science pipeline
- For experience merging the different business and writing components neccessary to succesfully deliver information to stakeholders
- Find drivers for customer churn at Telco
- Construct a ML classification model that accurately predicts customer churn
- Deliver a report that a non-data scientist can read through and understand
- What is the companies overall churn rate?
- Does the churn rate change based on the core service the customer has?
- Do supplementary internet support services impact churn?
- Do streaming services impact churn?
- Acquire data from MySQL
- Prepare data
- Remove unnecessary features
- Identify and replace missing values
- Alter innapropriate data types
- Explore the data to find drivers of churn and answer intital questions above
- Create a model to predict if a customer will churn
- Use features identified in explore to build predictive models
- Evaluate models on train and validate data
- Select the best model based on highest accuracy, lowest false negative rate, and lowest difference between test and validate scores
- Evaluate the best model on test data
- Conclude with recommendations and next steps
| Feature | Definition | Data Type |
|---|---|---|
| gender | The customer's gender. | object |
| senior_citizen | An indicator of old age. Information on what age is used to determine senior citizen status was not found | int64 |
| partner | Indicates if a customer has a partner. | object |
| dependents | Indicates if a customer has dependents. | object |
| phone_service | Indicates if a customer has phone service. | object |
| online_security | Indicates if a customer has online security. | object |
| online_backup | Indicates if a customer has online backup. | object |
| device_protection | Indicates if a customer has device protection. | object |
| tech_support | Indicates if a customer has tech support. | object |
| streaming_tv | Indicates if a customer has tv streaming. | object |
| streaming_movies | Indicates if a customer has movie streaming. | object |
| paperless_billing | Indicates if a customer has paperless billing. | object |
| churn | Indicates if a customer has churned. | object |
| contract_type | The customer's contract type | object |
| internet_service_type | The customer's internet service type | object |
| payment_type | The customer's payment method. | object |
| Additional Features | Encoded categorical data | uint8 |
- Clone this repo
- Insert credentials in the blank_eny.py file and save
- Run notebook
- Roughly 27% of customers churn
- Churn drivers for churn include internet service type (specifically fiber optic), the lack of supplementary internet support services, and streaming services.
- The discovered drivers led to a successful model when only surpassing the baseline is used as a metric, but when incorporating false negative rates I consider all of the models a failure.
- Compile customer complaint data from emails, phone calls, and website submissions. This data can be used to increase awareness of what drives to churn.
- Apply research-based retention strategies for customers with fiber optic internet. Fiber optic internet service had the highest churn rate out of all features. While specifics of what exactly causes this customer group to churn are not yet known, it is an area the requires customer retention efforts.