Skip to content

A comprehensive end-to-end machine learning system that predicts loan default risk using advanced algorithms and provides a professional web interface for risk assessment.

License

Notifications You must be signed in to change notification settings

d-kavinraja/AI-Powered-Loan-Eligibility-Risk-Scoring-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI-Powered Loan Eligibility & Risk Scoring System

An end-to-end machine learning system that trains and serves a robust model to predict loan default risk, exposed via a high-performance FastAPI backend.

Python FastAPI Scikit-learn LightGBM License


πŸ“– Table of Contents


🎯 Project Objective

The primary goal of this project is to develop a reliable, scalable, and end-to-end system for assessing loan default risk. By leveraging a machine learning model trained on historical data, the system provides instant risk scores and actionable recommendations. The entire solution is served through a well-documented RESTful API, making it easy to integrate into existing financial workflows and applications.


✨ Key Features

  • Advanced ML Model: Utilizes a LightGBM Classifier with GridSearchCV for hyperparameter tuning to ensure high accuracy and robustness in predictions.
  • Comprehensive Feature Engineering: Creates powerful interaction features (e.g., LoanIncomeRatio, MonthsEmployedCreditScore) to capture complex borrower behaviors.
  • High-Performance API: Built with FastAPI for asynchronous, high-speed request handling, making it suitable for production environments.
  • Robust Input Validation: Employs Pydantic schemas for strict, type-safe validation of all incoming request data, preventing common errors.
  • Model Insights Endpoint: Offers transparency by providing detailed model performance metrics, feature importances, and the best hyperparameters used.
  • Production-Ready Codebase: A modular, clean, and well-organized project structure that simplifies maintenance and future development.
  • Interactive Documentation: Automatically generates interactive API documentation (via Swagger UI and ReDoc) for easy testing and exploration.

πŸ“Š Visual Analysis & Model Insights

This section provides a visual overview of the dataset characteristics and the final model's performance. The charts are generated automatically during the model training process.

Exploratory Data Analysis (EDA)

Understanding the data is the first step toward building an effective model. These charts reveal key patterns and distributions in the loan dataset.

Chart Description & Insights "
Target Variable Imbalance
Insight: The pie chart reveals a significant class imbalance, with non-defaulted loans making up the vast majority of the dataset. This is a crucial finding as it justifies the use of evaluation metrics like ROC AUC and the Precision-Recall curve over simple accuracy, which can be misleading in such scenarios. -
Credit Score & Engineered Feature Analysis
Insight: The density plot confirms that applicants with higher credit scores are significantly less likely to default. The box plot for our engineered LoanIncomeRatio feature shows that defaulted loans tend to have a higher ratio, validating that this new feature is highly predictive of risk.

Model Performance Evaluation

After training and tuning, the model's performance is evaluated on a hold-out test set. These charts provide a clear picture of its predictive accuracy and reliability.

Chart Description & Insights "
Confusion Matrix & ROC Curve
Insight: The Confusion Matrix provides a detailed breakdown of correct and incorrect predictions, showing a strong ability to correctly identify both classes. The ROC Curve, with an Area Under the Curve (AUC) of 0.92, demonstrates the model's excellent capability to distinguish between defaulting and non-defaulting applicants across all classification thresholds. -
Feature Importance & Precision-Recall Curve
Insight: The Feature Importance chart is crucial for interpretability, revealing that our engineered LoanIncomeRatio and the original CreditScore are the most influential factors in the model's decisions. The Precision-Recall Curve is vital for imbalanced datasets and confirms that the model maintains a high level of precision and recall simultaneously, making it a reliable and robust classifier for this task.

πŸ—οΈ System Architecture

The system follows a standard machine learning model deployment architecture. The core components are decoupled for maintainability and scalability.

  1. Client: A user or service sends a POST request with borrower data in JSON format.
  2. FastAPI Backend:
    • Receives and validates the incoming data using Pydantic models.
    • Passes the validated data to the feature engineering module.
  3. ML Pipeline (.joblib artifact):
    • The loaded Scikit-learn pipeline preprocesses the data (scaling, encoding).
    • The trained LightGBM model predicts the probability of default.
  4. Response Generation: The API formats the prediction into a clear JSON response, including a risk score, category, and recommendation, and sends it back to the client.

πŸ“‚ Repository Structure

The project is organized into distinct modules, each with a specific responsibility.

LOAN-RISK-SYSTEM/
β”‚
β”œβ”€β”€ models/                     # Pydantic schemas and ML model artifacts
β”‚   β”œβ”€β”€ artifacts/
β”‚   β”‚   β”œβ”€β”€ charts/             # Generated performance charts
β”‚   β”‚   β”‚   β”œβ”€β”€ confusion_matrix.png
β”‚   β”‚   β”‚   β”œβ”€β”€ credit_score_density.png
β”‚   β”‚   β”‚   β”œβ”€β”€ feature_importance.png
β”‚   β”‚   β”‚   β”œβ”€β”€ loan_income_ratio_boxplot.png
β”‚   β”‚   β”‚   β”œβ”€β”€ precision_recall_curve.png
β”‚   β”‚   β”‚   β”œβ”€β”€ roc_curve.png
β”‚   β”‚   β”‚   └── target_imbalance.png
β”‚   β”‚   β”œβ”€β”€ loan_default_pipeline.joblib  # The serialized ML pipeline
β”‚   β”‚   └── model_insights.json         # Performance metrics & feature importance
β”‚   └── schemas.py                # Pydantic models for API validation
β”‚
β”œβ”€β”€ static/                     # Simple frontend files
β”‚
β”œβ”€β”€ utils/                      # Helper modules for the application
β”‚   β”œβ”€β”€ data_validation.py      # Data validation logic
β”‚   └── feature_engineering.py  # Feature engineering functions
β”‚
β”œβ”€β”€ .gitignore
β”œβ”€β”€ main.py                     # Main FastAPI application file
β”œβ”€β”€ README.md                   # This file
└── requirements.txt            # Project dependencies


πŸš€ Getting Started

Follow these steps to get the application running on your local machine.

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • A git client

Installation & Setup

  1. Clone the repository:

    git clone https://github.com/d-kavinraja/AI-Powered-Loan-Eligibility-Risk-Scoring-System.git
    cd AI-Powered-Loan-Eligibility-Risk-Scoring-System
  2. Create and activate a virtual environment:

    # For macOS/Linux
    python3 -m venv venv
    source venv/bin/activate
    
    # For Windows
    python -m venv venv
    venv\Scripts\activate
  3. Install the required dependencies:

    pip install -r requirements.txt

Running the Application

  1. Start the FastAPI server using Uvicorn:

    uvicorn main:app --host 127.0.0.1 --port 8000 --reload

    The --reload flag enables hot-reloading for development.

  2. Access the API:

    • Frontend: http://127.0.0.1:8000

βš™οΈ API Endpoints Guide

The API provides the following endpoints for interaction.

POST /api/predict

Predicts the loan default risk based on borrower data.

  • Request Body:

    {
      "Age": 30,
      "Income": 55000,
      "LoanAmount": 25000,
      "CreditScore": 650,
      "MonthsEmployed": 60,
      "NumCreditLines": 4,
      "InterestRate": 12.5,
      "LoanTerm": 36,
      "DTIRatio": 0.4,
      "Education": "Bachelor's",
      "EmploymentType": "Full-time",
      "MaritalStatus": "Married",
      "HasMortgage": "Yes",
      "HasDependents": "Yes",
      "LoanPurpose": "Business",
      "HasCoSigner": "No"
    }
  • Success Response (200 OK):

    {
      "prediction": 0,
      "risk_score": 0.253,
      "risk_category": "Low Risk",
      "recommendation": "Approved"
    }

GET /api/insights

Retrieves the performance metrics, feature importances, and parameters of the trained model.

GET /api/charts/{chart_name}

Serves static image files of the model's performance charts.

  • URL Parameters:
    • chart_name: e.g., confusion_matrix.png, feature_importance.png, etc.

GET /health

A simple health check endpoint to verify that the API is running and artifacts are loaded.


πŸ€– Model Training & Retraining

The model can be retrained with new data to improve its performance or adapt to new patterns. The complete training pipeline is documented in the training script.

To retrain the model:

  1. Prepare Your Data: Place your updated dataset in the designated data directory.
  2. Run the Training Script: Execute the training script from the root of the project.
  3. Verify Artifacts: The script will automatically overwrite the existing artifacts in the models/artifacts/ directory.
  4. Restart the API: Restart the Uvicorn server to load the newly trained model.

🀝 Contributing

Contributions are welcome! If you have suggestions for improvements, please open an issue or submit a pull request.


πŸ“œ License

This project is distributed under the MIT License. See LICENSE for more information.

About

A comprehensive end-to-end machine learning system that predicts loan default risk using advanced algorithms and provides a professional web interface for risk assessment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published