GitHub - DumindUdara/Loan-Default-Prediction: Data Science Coding Challange in Coursera

Project Introduction: Loan Default Prediction

In this project, I tackled the challenge of predicting loan defaults using machine learning techniques. The objective was to build a model that predicts which borrowers are at the highest risk of defaulting on their loan payments. This project was designed to evaluate my skills in data science and machine learning, simulating a real-world scenario where financial institutions aim to minimize loan defaults and allocate resources effectively.

Dataset Description

I worked with two datasets provided for this challenge:

train.csv: This dataset contained information about 255,347 past borrowers, including a target label indicating whether each borrower defaulted on their loan.
test.csv: This dataset included similar information about 109,435 borrowers but without the target label, which I needed to predict. . Both datasets included various features related to the borrowers and their loans.

Steps Taken

Data Exploration and Visualization:
- I began by exploring the datasets to understand the distributions, missing values, and relationships between features.
- Visualizations were created to gain insights into the data and the target variable distribution.
Data Cleaning and Preprocessing:
- I handled missing values appropriately and encoded categorical features using one-hot encoding.
- Numerical features were standardized to ensure uniformity and improve model performance.
Feature Engineering:
- I identified and processed both numerical and categorical features, preparing them for model training.
Model Building and Training:
- I used a RandomForestClassifier within a pipeline that included preprocessing steps.
- The data was split into training and validation sets to evaluate the model’s performance before making final predictions.
Model Evaluation:
- The model was trained on the training data and evaluated on the validation set using the ROC AUC metric.
- The performance was satisfactory, indicating the model's ability to distinguish between default and non-default cases.
Predictions on Test Data: After training and evaluating the model, I used it to predict the default probabilities for the test dataset. The results were compiled into a submission file following the required format.
Submission Preparation: I ensured the final predictions dataframe was formatted correctly with the columns LoanID and predicted_probability. The predictions were saved to a CSV file for submission.

Conclusion

This project provided a comprehensive experience in tackling a common machine learning problem in the financial domain. By following a structured approach from data exploration to model deployment, I was able to build a predictive model that can assist financial institutions in managing loan defaults more effectively. The final submission was validated to ensure compliance with the required format, and the model's performance was quantified using the ROC AUC metric.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
LoanDefaultPrediction.ipynb		LoanDefaultPrediction.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Introduction: Loan Default Prediction

Dataset Description

Steps Taken

Conclusion

About

Uh oh!

Releases

Packages

Languages

DumindUdara/Loan-Default-Prediction

Folders and files

Latest commit

History

Repository files navigation

Project Introduction: Loan Default Prediction

Dataset Description

Steps Taken

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages