This repository contains Machine Learning models for predicting loan approval status using various classification algorithms from scikit-learn, TensorFlow, and PyTorch.
This project is based on the Kaggle competition: Loan Approval Prediction by Walter Reade and Ashley Chow. The goal is to predict whether a loan will be approved or denied (binary classification) based on various features related to the applicant's profile and loan requirements.
The dataset contains information about loan applicants, including:
- Personal information (age, income, employment length, home ownership)
- Loan details (amount, interest rate, purpose, grade)
- Credit bureau information (credit history length, past defaults)
The target label is 'loan_status'.
loan-approval-prediction/
├── data/
│ └── train.csv
├── notebooks/
│ ├── sklearn_playground.ipynb
│ └── tf_playground.ipynb
├── models/
│ └── loan_model.pth
├── requirements.txt
└── README.md
Various classification models have been implemented and compared:
- Logistic Regression
- Random Forest
- Gradient Boosting
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Decision Tree
- Stacking Ensemble
- Multi-Layer Perceptron (Neural Network)
- Custom NN model
- Custom NN model
Model comparison based on Accuracy and ROC AUC scores from cross-validation:
| Model | Accuracy | ROC AUC |
|---|---|---|
| Logistic Regression | 0.9141 | 0.7615 |
| Random Forest | 0.9513 | 0.8527 |
| Gradient Boosting | 0.9517 | 0.8566 |
| Support Vector Machine | 0.9439 | 0.8312 |
| K-Nearest Neighbors | 0.9318 | 0.8129 |
| Decision Tree | 0.9143 | 0.8351 |
| Multi-Layer Perceptron | 0.9280 | N/A |
| Stacking Ensemble | 0.9518 | 0.8565 |
| Model | Mean ROC AUC | Std |
|---|---|---|
| Logistic Regression | 0.9020 | 0.0021 |
| Random Forest | 0.9343 | 0.0023 |
| Gradient Boosting | 0.9398 | 0.0027 |
| Support Vector Machine | 0.8916 | 0.0064 |
| K-Nearest Neighbors | 0.8773 | 0.0048 |
| Decision Tree | 0.8325 | 0.0083 |
The Gradient Boosting model achieved the highest performance.
| Model | Accuracy | ROC AUC |
|---|---|---|
| TensorFlow Neural Net | 0.9487 | 0.8454 |
| PyTorch Neural Net | 0.9503 | N/A |
Random Forest feature importance analysis revealed that the top 5 most important features for predicting loan approval are:
- Loan percentage of income (0.2358)
- Loan interest rate (0.1186)
- Person income (0.1065)
- Loan grade D (0.0916)
- Loan amount (0.0727)
- Clone this repository:
git clone https://github.com/yourusername/loan-approval-prediction.git
cd loan-approval-prediction
- Install required packages:
pip install -r requirements.txt
Python 3.6+ with the following packages:
- pandas
- numpy
- scikit-learn
- tensorflow
- torch
- tqdm
- jupyter
- Implement more advanced feature engineering techniques: PCA
- Try more complex models such as XGBoost, LightGBM
- Improve model interpretability with SHAP values
- Add visualizations of model performance and data distributions
This project is licensed under the MIT License - see the LICENSE file for details.
If you use or reference this code in your work, please cite it as:
Mardianto Hadiputro, Loan Approval Prediction Models, GitHub repository, 2025. Available at: https://github.com/MardiantoS/loan-prediction-model
- Walter Reade and Ashley Chow for creating the Kaggle competition
- The scikit-learn, TensorFlow, and PyTorch team for their excellent machine learning libraries