A production-grade machine learning system for predicting loan default risk in P2P lending. This project implements a full end-to-end pipeline including data cleaning, feature engineering, validation, and a Stacked Ensemble Model (XGBoost, LightGBM, CatBoost).
It also features an Interactive Dashboard for real-time risk assessment.
- 📄 Overview
- ⭐ Features
- 📊 Interactive Dashboard
- 🔧 Configuration
- ⚙️ Installation & Usage
- 🏗 Architecture
- 📜 License
The goal is to help investors minimize risk. This system analyzes loan application data (based on Lending Club 2007-2020) to predict the probability of default.
Key capabilities:
- Robust Pipeline: Automated cleaning, capping, and feature engineering.
- Advanced Modeling: A meta-model stacking approach that outperforms individual classifiers.
- Real-time Inference: A Streamlit app to test scenarios instantly.
- 🧠 Stacked Ensemble: Combines XGBoost, LightGBM, and CatBoost via a meta-learner.
- 📱 Interactive Dashboard: Web interface for business users to test loan applicants.
- ⚙️ Configurable: Hyperparameters are managed via
params.yaml, not hardcoded. - 🔒 Reproducible: Pinned dependency versions and seed control.
- 🛡 Leakage-Free: Strict separation of fitting and transforming.
Test the model in real-time using the Streamlit app.
streamlit run app.pyCapabilities:
- Input financial details (FICO, Income, Loan Amount, etc.)
- Visualize risk score via gauge charts.
- Get instant "Approve" or "Reject" recommendations.
All model hyperparameters are stored in params.yaml. You can modify them without touching the code.
For fast testing (smoke tests), you can use the lightweight config:
PARAMS_PATH=params_fast.yaml python main.pyClone the repo and install dependencies:
git clone https://github.com/Falak-Parmar/P2P-lending-risk-assesment.git
cd P2P-lending-risk-assesment
pip install -r requirements.txtTo train the model from scratch (cleaning -> engineering -> training):
python main.pyArtifacts will be saved in models/ and data/ directories.
streamlit run app.py.
├── app.py # Streamlit Dashboard entry point
├── main.py # Pipeline Orchestrator
├── params.yaml # Model Hyperparameters
├── requirements.txt # Pinned dependencies
├── data/ # Data storage (Raw, Cleaned, Processed)
├── models/ # Saved model artifacts (.pkl)
├── logs/ # Execution logs
├── src/ # Pipeline Source Code
│ ├── data_cleaning.py
│ ├── data_feature_engineering.py
│ ├── data_preprocessing.py
│ └── model.py
└── utils/ # Utility functions
├── stacking.py # Ensemble logic
└── ...
MIT License