Predicting Borrower Risk to Reduce Financial Losses
We built and evaluated three classification models: Logistic Regression, Random Forest, and XGBoost; to predict loan defaults on a retail lending portfolio. Our goal was to identify high-risk borrowers early, enabling targeted interventions and minimizing losses.
Key Findings:
- Dataset size: 2M+ loan records
- Default rate in data: 12.4%
- Best recall (catching most defaults): Random Forest (0.86)
- Best ROC-AUC (overall discrimination): Random Forest (0.65)
- Highest F1-score (balance between precision and recall): Logistic Regression & Random Forest (0.26)
Critical Insights:
- High Recall Priority: In credit risk modeling, recall is more critical than precision. Catching more defaulters, even with some false alarms—prevents financial losses. A missed defaulter (false negative) can result in unrecoverable capital, whereas a false positive can be handled via manual review or adjusted loan terms.
- Model Comparison:
- Random Forest achieves the highest recall (0.86) and ROC-AUC (0.65), making it ideal for minimizing undetected defaults.
- Logistic Regression provides better interpretability and similar F1-score (0.26) with fewer false positives (precision 0.17).
- XGBoost underperforms on all metrics relative to others.
- Model Selection Rationale: We recommend Random Forest as the primary model due to its superior recall and discrimination power. Logistic Regression can still be useful for interpretability and quick deployment.
Loan default prediction is critical for retail lenders to manage credit risk and maintain profitability. Our data contained demographic, credit bureau, and historical repayment features. The low base rate of defaults (12.4%) and noisy real-world signals make high-precision, high-recall models difficult to achieve simultaneously.
| Model | Precision | Recall | F1-score | ROC-AUC |
|---|---|---|---|---|
| Logistic Regression | 0.17 | 0.52 | 0.26 | 0.60 |
| Random Forest | 0.15 | 0.86 | 0.26 | 0.65 |
| XGBoost | 0.15 | 0.41 | 0.22 | 0.57 |
The ROC-AUC curve shows Random Forest delivers the best trade-off between sensitivity and specificity. Its recall of 0.86 means it successfully flags 86% of all actual defaulters. While its precision (0.15) is low, this is an acceptable compromise for use cases focused on minimizing undetected risk.
- Financial Risk Reduction: Catching more defaulters reduces charge-offs and improves portfolio health.
- Business Justification: False positives (non-defaulters flagged as defaulters) can be managed operationally with manual checks or conservative credit terms.
- Compliance: Models must show robustness in identifying risk accurately, even at the cost of precision.
- Random Forest is the most effective model, delivering:
- Highest recall (0.86)
- Best ROC-AUC (0.65)
- Competitive F1-score (0.26)
Given its predictive strength and robustness, Random Forest is recommended as the production candidate, especially in high-stakes lending environments.
- Issue: The low precision across all models (0.15–0.17) indicates a high false positive rate. Many non-defaulters are incorrectly flagged.
- Next Steps:
- Threshold Tuning: Adjust probability thresholds to optimize for a better precision-recall balance suited to business goals.
- Segment Analysis: Identify borrower subgroups where the model struggles, such as thin-file or new-to-credit applicants.
- Cost-Sensitive Training: Penalize false positives more explicitly to force the model to improve its precision.
-
Deploy Random Forest
- Launch in a pilot phase to monitor recall and operational false positives.
- Create a fallback review process for flagged borrowers.
-
Enhance Dataset
- Add alternative data (social data, utility payments).
- Enrich features to improve signal-to-noise ratio.
-
Optimize Decision Thresholds
- Calibrate probability cutoffs using business impact simulations.
Target Metrics:
- ROC-AUC: ≥ 0.75
- Recall: ≥ 0.70 (even if precision is ~0.20)
- Precision: ≥ 0.25 (at recall ≥ 0.50)
Implementation Timeline:
- Phase 1 (30 days): Deploy Random Forest and monitor KPI drift.
- Phase 2 (60 days): Conduct more robust feature engineering and threshold tuning.
- Phase 3 (90 days): Retrain with improved data and refine business integration.
All metrics derived from cross-validation on hold-out datasets. Model selection and thresholds should be reviewed jointly by data science and credit policy teams.


