Skip to content

Dev-Toledo/AI-Credit-Score-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 AI Credit Score Prediction

A Machine Learning project developed to automatically evaluate and predict customer credit scores for a financial institution.

📋 About the Project

A bank needs to define the credit score of its customers to safely approve or deny loans and financial products. Instead of analyzing each customer's history manually, this project uses Artificial Intelligence to analyze complex financial data and automatically classify their credit score into three categories: Poor, Standard, or Good.

Key Financial Features Analyzed:

  • Annual Salary & Profession
  • Number of Bank Accounts & Credit Cards
  • Payment Behavior & Delay Days
  • Total Debt & Credit Mix
  • Monthly Investments

🛠️ Technologies Used

  • Python 3
  • Pandas: For data manipulation, cleaning, and preprocessing.
  • Scikit-Learn: For building, training, and evaluating the Machine Learning models (e.g., Decision Tree Classifier).
  • Jupyter Notebook / VS Code: For interactive development and data exploration.

🧠 Machine Learning Pipeline

  1. Data Preprocessing: Loading the historical data (clientes.csv) and applying Label Encoding to transform categorical text data (like profissao and comportamento_pagamento) into numerical values that the AI can understand.
  2. Splitting the Data: Dividing the dataset into training features ($X$) and the target labels ($y$ - score_credito).
  3. Model Training: Teaching the machine learning algorithm using thousands of historical customer behaviors.
  4. Prediction: Using the trained and validated model to evaluate a new, unseen list of customers (novos_clientes.csv) and outputting their predicted credit scores.

🤖 Machine Learning Models Used

To ensure the highest possible accuracy in classification, this project trained, tested, and compared two classic and powerful Machine Learning algorithms:

1. Random Forest

Random Forest is an ensemble learning algorithm. Instead of relying on a single decision tree to evaluate the customer, it creates a "forest" of dozens or hundreds of trees operating simultaneously, where the final decision is made by majority vote.

  • Why it was used: It is an extremely robust model. It handles a large number of financial variables perfectly and is highly resistant to overfitting (when the model "memorizes" training data instead of learning), delivering highly reliable predictions for credit risk.
  • Accuracy Achieved: 82.63%

2. K-Nearest Neighbors (KNN)

KNN is a distance and similarity-based algorithm. It classifies a new customer's score by mapping their mathematical proximity to the "K" most similar customers in the historical database.

  • Why it was used: It is a very logical model for the financial market. The premise is simple: if a new customer has a salary, debt level, and payment habits very close to a group of customers who already have a "Good" score, the model assumes this new customer will also have the same score.
  • Accuracy Achieved: 73.46%

🏆 Final Model Selection: After the training and testing phase with the historical database, the chosen model to predict the data in the novos_clientes.csv file was the Random Forest, as it presented the highest accuracy rate (82.63%) and the best overall performance.


⚙️ How to Run Locally

  1. Clone this repository to your computer.
  2. Ensure the datasets (clientes.csv and novos_clientes.csv) are in the root directory.
  3. Set up your virtual environment and install the dependencies in your terminal:
    python3 -m venv venv
    source venv/bin/activate
    pip install pandas scikit-learn ipykernel
  4. Open the main.ipynb file in your preferred editor (like VS Code), ensure your Python environment (venv) is selected as the kernel, and run the cells.

🎓 Credits

Project developed as part of the educational material provided by Hashtag Programação.

About

Machine Learning model built with Python and Scikit-Learn to automatically predict and classify bank customers' credit scores.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors