Skip to content

DumindUdara/Loan-Default-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Project Introduction: Loan Default Prediction

In this project, I tackled the challenge of predicting loan defaults using machine learning techniques. The objective was to build a model that predicts which borrowers are at the highest risk of defaulting on their loan payments. This project was designed to evaluate my skills in data science and machine learning, simulating a real-world scenario where financial institutions aim to minimize loan defaults and allocate resources effectively.

Dataset Description

I worked with two datasets provided for this challenge:

  1. train.csv: This dataset contained information about 255,347 past borrowers, including a target label indicating whether each borrower defaulted on their loan.
  2. test.csv: This dataset included similar information about 109,435 borrowers but without the target label, which I needed to predict. . Both datasets included various features related to the borrowers and their loans.

GitHub Repo stars GitHub last commit GitHub issues GitHub forks Python scikit-learn Data Source Status Contributions


Steps Taken

  1. Data Exploration and Visualization:

    • I began by exploring the datasets to understand the distributions, missing values, and relationships between features.
    • Visualizations were created to gain insights into the data and the target variable distribution.
  2. Data Cleaning and Preprocessing:

    • I handled missing values appropriately and encoded categorical features using one-hot encoding.
    • Numerical features were standardized to ensure uniformity and improve model performance.
  3. Feature Engineering:

    • I identified and processed both numerical and categorical features, preparing them for model training.
  4. Model Building and Training:

    • I used a RandomForestClassifier within a pipeline that included preprocessing steps.
    • The data was split into training and validation sets to evaluate the model’s performance before making final predictions.
  5. Model Evaluation:

    • The model was trained on the training data and evaluated on the validation set using the ROC AUC metric.
    • The performance was satisfactory, indicating the model's ability to distinguish between default and non-default cases.
  6. Predictions on Test Data: After training and evaluating the model, I used it to predict the default probabilities for the test dataset. The results were compiled into a submission file following the required format.

  7. Submission Preparation: I ensured the final predictions dataframe was formatted correctly with the columns LoanID and predicted_probability. The predictions were saved to a CSV file for submission.

Conclusion

This project provided a comprehensive experience in tackling a common machine learning problem in the financial domain. By following a structured approach from data exploration to model deployment, I was able to build a predictive model that can assist financial institutions in managing loan defaults more effectively. The final submission was validated to ensure compliance with the required format, and the model's performance was quantified using the ROC AUC metric.

About

Data Science Coding Challange in Coursera

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published