🤖 AI Credit Score Prediction

A Machine Learning project developed to automatically evaluate and predict customer credit scores for a financial institution.

📋 About the Project

A bank needs to define the credit score of its customers to safely approve or deny loans and financial products. Instead of analyzing each customer's history manually, this project uses Artificial Intelligence to analyze complex financial data and automatically classify their credit score into three categories: Poor, Standard, or Good.

Key Financial Features Analyzed:

Annual Salary & Profession
Number of Bank Accounts & Credit Cards
Payment Behavior & Delay Days
Total Debt & Credit Mix
Monthly Investments

🛠️ Technologies Used

Python 3
Pandas: For data manipulation, cleaning, and preprocessing.
Scikit-Learn: For building, training, and evaluating the Machine Learning models (e.g., Decision Tree Classifier).
Jupyter Notebook / VS Code: For interactive development and data exploration.

🧠 Machine Learning Pipeline

Data Preprocessing: Loading the historical data (clientes.csv) and applying Label Encoding to transform categorical text data (like profissao and comportamento_pagamento) into numerical values that the AI can understand.
Splitting the Data: Dividing the dataset into training features ($X$) and the target labels ($y$ - score_credito).
Model Training: Teaching the machine learning algorithm using thousands of historical customer behaviors.
Prediction: Using the trained and validated model to evaluate a new, unseen list of customers (novos_clientes.csv) and outputting their predicted credit scores.

🤖 Machine Learning Models Used

To ensure the highest possible accuracy in classification, this project trained, tested, and compared two classic and powerful Machine Learning algorithms:

1. Random Forest

Random Forest is an ensemble learning algorithm. Instead of relying on a single decision tree to evaluate the customer, it creates a "forest" of dozens or hundreds of trees operating simultaneously, where the final decision is made by majority vote.

Why it was used: It is an extremely robust model. It handles a large number of financial variables perfectly and is highly resistant to overfitting (when the model "memorizes" training data instead of learning), delivering highly reliable predictions for credit risk.
Accuracy Achieved: 82.63%

2. K-Nearest Neighbors (KNN)

KNN is a distance and similarity-based algorithm. It classifies a new customer's score by mapping their mathematical proximity to the "K" most similar customers in the historical database.

Why it was used: It is a very logical model for the financial market. The premise is simple: if a new customer has a salary, debt level, and payment habits very close to a group of customers who already have a "Good" score, the model assumes this new customer will also have the same score.
Accuracy Achieved: 73.46%

🏆 Final Model Selection: After the training and testing phase with the historical database, the chosen model to predict the data in the novos_clientes.csv file was the Random Forest, as it presented the highest accuracy rate (82.63%) and the best overall performance.

⚙️ How to Run Locally

Clone this repository to your computer.
Ensure the datasets (clientes.csv and novos_clientes.csv) are in the root directory.

Set up your virtual environment and install the dependencies in your terminal:

python3 -m venv venv
source venv/bin/activate
pip install pandas scikit-learn ipykernel

Open the main.ipynb file in your preferred editor (like VS Code), ensure your Python environment (venv) is selected as the kernel, and run the cells.

🎓 Credits

Project developed as part of the educational material provided by Hashtag Programação.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clientes.csv		clientes.csv
main.ipynb		main.ipynb
novos_clientes.csv		novos_clientes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 AI Credit Score Prediction

📋 About the Project

Key Financial Features Analyzed:

🛠️ Technologies Used

🧠 Machine Learning Pipeline

🤖 Machine Learning Models Used

1. Random Forest

2. K-Nearest Neighbors (KNN)

⚙️ How to Run Locally

🎓 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Credit Score Prediction

📋 About the Project

Key Financial Features Analyzed:

🛠️ Technologies Used

🧠 Machine Learning Pipeline

🤖 Machine Learning Models Used

1. Random Forest

2. K-Nearest Neighbors (KNN)

⚙️ How to Run Locally

🎓 Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages