An end-to-end machine learning system that trains and serves a robust model to predict loan default risk, exposed via a high-performance FastAPI backend.
- π― Project Objective
- β¨ Key Features
- π Visual Analysis & Model Insights
- ποΈ System Architecture
- π Repository Structure
- π Getting Started
- βοΈ API Endpoints Guide
- π€ Model Training & Retraining
- π€ Contributing
- π License
The primary goal of this project is to develop a reliable, scalable, and end-to-end system for assessing loan default risk. By leveraging a machine learning model trained on historical data, the system provides instant risk scores and actionable recommendations. The entire solution is served through a well-documented RESTful API, making it easy to integrate into existing financial workflows and applications.
- Advanced ML Model: Utilizes a LightGBM Classifier with
GridSearchCVfor hyperparameter tuning to ensure high accuracy and robustness in predictions. - Comprehensive Feature Engineering: Creates powerful interaction features (e.g.,
LoanIncomeRatio,MonthsEmployedCreditScore) to capture complex borrower behaviors. - High-Performance API: Built with FastAPI for asynchronous, high-speed request handling, making it suitable for production environments.
- Robust Input Validation: Employs Pydantic schemas for strict, type-safe validation of all incoming request data, preventing common errors.
- Model Insights Endpoint: Offers transparency by providing detailed model performance metrics, feature importances, and the best hyperparameters used.
- Production-Ready Codebase: A modular, clean, and well-organized project structure that simplifies maintenance and future development.
- Interactive Documentation: Automatically generates interactive API documentation (via Swagger UI and ReDoc) for easy testing and exploration.
This section provides a visual overview of the dataset characteristics and the final model's performance. The charts are generated automatically during the model training process.
Understanding the data is the first step toward building an effective model. These charts reveal key patterns and distributions in the loan dataset.
After training and tuning, the model's performance is evaluated on a hold-out test set. These charts provide a clear picture of its predictive accuracy and reliability.
The system follows a standard machine learning model deployment architecture. The core components are decoupled for maintainability and scalability.
- Client: A user or service sends a POST request with borrower data in JSON format.
- FastAPI Backend:
- Receives and validates the incoming data using Pydantic models.
- Passes the validated data to the feature engineering module.
- ML Pipeline (
.joblibartifact):- The loaded Scikit-learn pipeline preprocesses the data (scaling, encoding).
- The trained LightGBM model predicts the probability of default.
- Response Generation: The API formats the prediction into a clear JSON response, including a risk score, category, and recommendation, and sends it back to the client.
The project is organized into distinct modules, each with a specific responsibility.
LOAN-RISK-SYSTEM/
β
βββ models/ # Pydantic schemas and ML model artifacts
β βββ artifacts/
β β βββ charts/ # Generated performance charts
β β β βββ confusion_matrix.png
β β β βββ credit_score_density.png
β β β βββ feature_importance.png
β β β βββ loan_income_ratio_boxplot.png
β β β βββ precision_recall_curve.png
β β β βββ roc_curve.png
β β β βββ target_imbalance.png
β β βββ loan_default_pipeline.joblib # The serialized ML pipeline
β β βββ model_insights.json # Performance metrics & feature importance
β βββ schemas.py # Pydantic models for API validation
β
βββ static/ # Simple frontend files
β
βββ utils/ # Helper modules for the application
β βββ data_validation.py # Data validation logic
β βββ feature_engineering.py # Feature engineering functions
β
βββ .gitignore
βββ main.py # Main FastAPI application file
βββ README.md # This file
βββ requirements.txt # Project dependencies
Follow these steps to get the application running on your local machine.
- Python 3.8 or higher
pippackage manager- A
gitclient
-
Clone the repository:
git clone https://github.com/d-kavinraja/AI-Powered-Loan-Eligibility-Risk-Scoring-System.git cd AI-Powered-Loan-Eligibility-Risk-Scoring-System -
Create and activate a virtual environment:
# For macOS/Linux python3 -m venv venv source venv/bin/activate # For Windows python -m venv venv venv\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Start the FastAPI server using Uvicorn:
uvicorn main:app --host 127.0.0.1 --port 8000 --reload
The
--reloadflag enables hot-reloading for development. -
Access the API:
- Frontend:
http://127.0.0.1:8000
- Frontend:
The API provides the following endpoints for interaction.
Predicts the loan default risk based on borrower data.
-
Request Body:
{ "Age": 30, "Income": 55000, "LoanAmount": 25000, "CreditScore": 650, "MonthsEmployed": 60, "NumCreditLines": 4, "InterestRate": 12.5, "LoanTerm": 36, "DTIRatio": 0.4, "Education": "Bachelor's", "EmploymentType": "Full-time", "MaritalStatus": "Married", "HasMortgage": "Yes", "HasDependents": "Yes", "LoanPurpose": "Business", "HasCoSigner": "No" } -
Success Response (200 OK):
{ "prediction": 0, "risk_score": 0.253, "risk_category": "Low Risk", "recommendation": "Approved" }
Retrieves the performance metrics, feature importances, and parameters of the trained model.
Serves static image files of the model's performance charts.
- URL Parameters:
chart_name: e.g.,confusion_matrix.png,feature_importance.png, etc.
A simple health check endpoint to verify that the API is running and artifacts are loaded.
The model can be retrained with new data to improve its performance or adapt to new patterns. The complete training pipeline is documented in the training script.
To retrain the model:
- Prepare Your Data: Place your updated dataset in the designated data directory.
- Run the Training Script: Execute the training script from the root of the project.
- Verify Artifacts: The script will automatically overwrite the existing artifacts in the
models/artifacts/directory. - Restart the API: Restart the Uvicorn server to load the newly trained model.
Contributions are welcome! If you have suggestions for improvements, please open an issue or submit a pull request.
This project is distributed under the MIT License. See LICENSE for more information.






