Skip to content

neelkantnewra/Loan-Approval-Prediction

Repository files navigation

Loan Approval Prediction

BITS WILP Subject Semester Student ID Deadline

Python Streamlit Scikit-Learn Pandas Status License

1. Problem Statement

Goal: The objective of this project is to build a robust machine learning classification engine to predict whether a loan applicant should be Approved or Rejected.

Manual loan approval processes are time-consuming and prone to human error. By analyzing historical data—including applicant income, credit score, debt levels, and loan intent—this system automates the decision-making process. This solution aims to help financial institutions minimize default risks while accelerating the approval workflow for eligible candidates.

2. Dataset Description

Kaggle Dataset Usability License Author Year

The dataset consists of financial and demographic information about loan applicants. It satisfies the assignment requirement of having a minimum of 12 features and over 500 instances (Total: 50,000 instances and 18 feature excluding primary key and target).

  • Target Variable: loan_status (Binary: 0 = Rejected, 1 = Approved)
  • Key Features:
    • Demographics: age, occupation_status, years_employed.
    • Financial Health: annual_income, savings_assets, current_debt.
    • Credit History: credit_score, credit_history_years, defaults_on_file, delinquencies_last_2yrs, derogatory_marks.
    • Loan Details: loan_amount, interest_rate, loan_intent, product_type.
    • Ratios: debt_to_income_ratio, loan_to_income_ratio, payment_to_income_ratio.

3. Models Used & Evaluation Metrics

Six different classification algorithms were implemented and evaluated to find the best-performing model.

Performance Comparison Table

ML Model Name Accuracy AUC Score Precision Recall F1 Score MCC Score
Logistic Regression 0.7770 0.8599 0.7767 0.7770 0.7760 0.5476
Decision Tree 0.8774 0.9058 0.8778 0.8774 0.8770 0.7520
kNN Classifier 0.5737 0.5883 0.5762 0.5737 0.5745 0.1434
Naive Bayes 0.7811 0.8684 0.7846 0.7811 0.7816 0.5630
Random Forest 0.9061 0.9712 0.9062 0.9061 0.9061 0.8105
XGBoost 0.9262 0.9833 0.9264 0.9262 0.9261 0.8508

Observations on Model Performance

ML Model Name Observation about model performance
Logistic Regression Performed moderately (77.7% Accuracy). It struggled to capture the complex, non-linear relationships between financial variables like debt ratios and loan intent.
Decision Tree Significant improvement over linear models (87.7% Accuracy). This suggests the dataset has strong non-linear decision boundaries that a simple split-based approach can capture effectively.
kNN Poor performance (57.4% Accuracy). The model struggled significantly, likely due to the "curse of dimensionality" where distance-based metrics become less effective with many financial features.
Naive Bayes Comparable to Logistic Regression (78.1% Accuracy). Its assumption of feature independence is likely violated by correlated features (e.g., loan_amount vs loan_to_income_ratio), limiting its potential.
Random Forest Excellent performance (90.6% Accuracy). The ensemble approach successfully reduced the variance seen in the single Decision Tree, providing a robust and high-confidence classifier.
XGBoost Best overall model (92.6% Accuracy, 0.85 MCC). It achieved the highest AUC (0.98), indicating superior ability to distinguish between approved and rejected applicants. It effectively handled class imbalances and complex feature interactions.

4. Project Structure

This repository follows the required folder structure:

BITS-ML-WILP-SEM1-Assignment-2/
│
├── model/                  # Saved trained models (*.joblib)
├── preprocessing/          # Saved pipeline preprocessors (*.pkl)
├── app.py                  # Streamlit Application (Frontend)
├── main.py                 # Training Pipeline Script
├── requirements.txt        # Project Dependencies
├── Makefile                # Automation commands
├── Dockerfile              # Containerization setup
└── README.md               # Project Documentation

5. How to Run Locally

Step 1: Clone the Repository

git clone https://github.com/neelkantnewra/Loan-Approval-Prediction.git
cd Loan-Approval-Prediction

Step 2: Install Dependency

Ensure that all the dependent file are installed with right Python version

pip install -r requirements.txt

Step 3: Run the Streamlit app

streamlit run app.py

The app will open in your browser at http://localhost:8501

6.Deployment

The application is deployed on Streamlit Community Cloud and can be accessed here:
https://loan-approval-prediction-bits-wilp-2025ab05299.streamlit.app/

7. BITS Virtual Lab Screenshot

Screenshot of Lab , to prove that project is build inside the BITS environment

Screenshot 2026-01-25 at 2 10 57 PM Screenshot 2026-01-25 at 4 40 16 PM Screenshot 2026-01-25 at 4 40 33 PM

8. Final Submission Checklist (Before You Submit)

  • GitHub repo link works
  • Streamlit app link opens correctly
  • App loads without errors
  • All required features implemented
  • README.md updated and added in the submitted PDF
  • Screenshot attached

About

This repository contain ML Assignment for BITS WILP Course, With Deployed Streamlit Application comparing Various Classification model performance

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors