In the intensely competitive telecom industry, customer churn, particularly among prepaid users, presents a significant hurdle. With yearly churn rates reaching 12% to 30%, keeping high-value customers is crucial for maintaining profitability, as bringing in new customers can be 5 to 10 times more expensive than retaining existing ones.
This project tackles churn prediction for a telecom provider. By analyzing customer-level data, we aim to identify customers at high risk of churning and reveal the factors most correlated with churn behavior. This project utilizes CHAID (Chi-squared Automatic Interaction Detector) for market segmentation and Google’s Gemini Flash LLM to generate in-depth, analyst-style insights.

Chi-square automatic interaction detection (CHAID) is one such tool. By performing a regression analysis on customer or respondent data, you learn how different factors affect your sales and marketing activities. Arguably, CHAID’s key advantage over trade-off analysis techniques is its highly visual outputs that are easy to understand and share. That makes it a great tool to use when planning to share insights across the business for different teams to review.CHAID determines and analyzes the relationship between a response variable and others, so you can forecast how to have the biggest impact. The CHAID algorithm splits nodes to produce chi-square values.
A chi-square value is the difference between a standard, expected scenario and the actual results observed in your data.
The maximum chi-square value is the most statistically significant result in your CHAID decision tree. In other words, it’s the strongest relationship between two variables out of found chi-square values.
Splits with higher total expected chi-square values suggest stronger associations between the variables – i.e. more significant differences in the decision tree.
By finding these associations in B2B market research, you can discover different segments in your customer base – each with specific traits that will inform your targeting tactics
- Predictive Modeling: Build models to predict customer churn risk.
- Segment Analysis: Identify churn patterns across different customer segments.
- Analytical Insights with LLMs: Leverage Google Gemini Flash to provide detailed insights, helping business stakeholders develop targeted strategies.
Churn behavior differs between postpaid and prepaid customers:
- Postpaid Customers: Easily tracked when they decide to switch to another provider.
- Prepaid Customers: More difficult to track, as they may stop using services without notification. This makes churn prediction critical for prepaid models, which are more common in Indian and Southeast Asian markets.
"CHAID Tree Visualization done Using IBM's SPSS Software"
This project leverages CHAID for customer segmentation and churn analysis. The following insights were generated using the model and further refined with insights from Google Gemini Flash:
- Month-to-Month Customers:
- Total: 1,997 (No: 1,457, Yes: 540)
- Significant churn rate of 27%, indicating higher risk.
- One-Year Customers:
- Total: 397 (No: 360, Yes: 37)
- Churn rate of 9.3%, suggesting better retention than month-to-month plans.
- Two-Year Customers:
- Total: 486 (No: 471, Yes: 15)
- The lowest churn rate at 3.1%, indicating high retention potential.
- DSL Customers: Churn rate of 31%, mainly affecting non-premium customers.
- Fiber Optic Customers: Churn rate of 57%, higher among premium customers spending over 2k.
- DSL Customers: Churn rate of 9.3%, showing good retention.
- Churn Behavior: Only 3.1% churned, indicating strong retention among long-term customers.
- Premium Customers: Higher churn in month-to-month and fiber optic segments, despite higher spending (2k to 10k).
- Non-Premium Customers: Higher churn tendency, especially in month-to-month plans, potentially influenced by payment method choices.
- Bank Transfer (Automatic): Associated with lower churn across segments.
- Credit Card (Automatic): Higher churn rates among premium customers, potentially indicating cost-related dissatisfaction.
- Electronic Check: Popular among both premium and non-premium customers, though premium users show higher churn.
- Focus on Month-to-month, Fiber optic, Electronic Check segment:
- Offer loyalty programs or incentives to encourage switching to longer contract terms.
- Provide tailored promotions and offers to entice customers to switch to alternative payment methods.
- Implement targeted marketing campaigns highlighting the benefits of alternative internet services or bundles.
- Promote automatic payment methods:
- Offer discounts or rewards for automatic payments.
- Provide clear communication about the benefits of automatic payment methods.
- Enhance customer service:
- Improve the customer experience with Electronic check payments to address potential issues.
- Implement proactive customer service interventions for high-risk segments.
- Target customers with No Internet:
- Offer competitive packages to attract these customers to internet services.
- Leverage contract commitment:
- Implement strategies to encourage longer contract commitments.
- Highlight the benefits of long-term contracts through targeted promotions.
The insights from CHAID analysis and LLM-based interpretation reveal key factors in churn behavior, highlighting areas to focus on for retaining high-value customers and improving loyalty. With effective, targeted strategies based on these insights, telecom companies can better mitigate churn and foster long-term customer relationships.
- Python 3.x
- Necessary libraries:
pandas,scikit-learn,CHAID,Google Gemini API,matplotlib,spss
- Data Preprocessing: Load and preprocess telecom customer data.
- Model Training: Run the CHAID-based churn prediction model.
- Generate Insights: Use Google Gemini Flash to generate analyst-style insights.