A comprehensive machine learning application for predicting loan defaults and optimizing investment portfolios using the LendingClub dataset.
This project demonstrates advanced machine learning techniques applied to financial risk assessment. The application trains multiple classification models on historical LendingClub loan data to predict default probabilities, then uses these predictions to construct an IRR-optimized investment portfolio.
- Multiple ML Models: 8+ different algorithms including Logistic Regression, Random Forest, Gradient Boosting, and Neural Networks
- Interactive Dashboard: Real-time visualization of loan data, model performance, and portfolio optimization
- Portfolio Optimization: IRR-based portfolio construction with customizable investment criteria
- Live Predictions: Real-time loan default predictions via REST API
- Advanced Analytics: Comprehensive EDA with interactive choropleth maps and statistical analysis
- 7.40% IRR for 36-month loans (vs. 6.30% baseline)
- 10.63% IRR for 60-month loans (vs. 8.11% baseline)
- 1.51% and 0.99% alpha over baseline for 36-month and 60-month loans respectively
- Statistically significant results at 1% confidence level
- Docker & Docker Compose
- Python 3.9+ (for local development)
- Git
Option 1: Docker Compose (Recommended)
# Clone the repository
git clone https://github.com/yourusername/LendingClub_ML_App.git
cd LendingClub_ML_App
# Run the entire application
bash build_e2e.shOption 2: Manual Docker Build
# Build and run backend
docker build -t flask_backend:v1 -f ./app/backend/Dockerfile.backend .
docker run -d -p 5000:5000 --name flask_backend flask_backend:v1
# Build and run frontend
docker build -t dash_frontend:v1 -f ./app/frontend/Dockerfile.frontend .
docker run -d -p 8050:8050 --name dash_frontend dash_frontend:v1Option 3: Local Development
# Backend
cd app/backend
pip install -r requirements_backend.txt
python flask_serve.py
# Frontend (in another terminal)
cd app/frontend
pip install -r requirements_frontend.txt
python app.py- Frontend Dashboard: http://localhost:8050
- Backend API: http://localhost:5000
- API Documentation: http://localhost:5000/api/v1/predict
POST /api/v1/predict
Predict loan default probability using trained ML models.
{
"query": [[feature1, feature2, ..., featureN]],
"model": "GBC"
}{
"prediction": "No Default",
"confidence": [0.123, 0.877]
}QDA- Quadratic Discriminant AnalysisLDA- Linear Discriminant AnalysisLOGIT- Logistic RegressionGBC- Gradient Boosting Classifier
curl -X POST http://localhost:5000/api/v1/predict \
-H "Content-Type: application/json" \
-d '{"query": [[50000, 700, 5, 10]], "model": "GBC"}'βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β Data Layer β
β (Dash/Flask) βββββΊβ (Flask API) βββββΊβ (Pickle Files)β
β Port: 8050 β β Port: 5000 β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
| Model | Non-2018 AUC | 2018 AUC | Performance |
|---|---|---|---|
| CatBoost Classifier | 0.892 | 0.841 | π₯ Best |
| MLP Neural Net | 0.884 | 0.816 | π₯ Excellent |
| Gradient Boosting | 0.831 | 0.766 | π₯ Good |
| Random Forest | 0.769 | 0.697 | β Good |
LendingClub_ML_App/
βββ app/
β βββ backend/ # Flask API server
β β βββ flask_serve.py
β β βββ requirements_backend.txt
β β βββ Dockerfile.backend
β βββ frontend/ # Dash web application
β β βββ app.py
β β βββ constants/
β β βββ requirements_frontend.txt
β β βββ Dockerfile.frontend
β βββ data/ # ML models and datasets
βββ notebooks/ # Jupyter notebooks for EDA
βββ presentation/ # Project presentation materials
βββ docker-compose.yml # Multi-container orchestration
βββ build_e2e.sh # End-to-end build script
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Dataset: LendingClub 2007-2020Q1
- Blog Post: Predicting Loan Defaults using ML
- Video Presentation: YouTube
- Presentation Slides: PDF
This project is licensed under the MIT License - see the LICENSE file for details.
Philippe Heitzmann
- Email: philheitz6[at]gmail[dot]com
- LinkedIn: [Your LinkedIn Profile]
- GitHub: @yourusername
β Star this repository if you found it helpful!


