Skip to content

ShuvoNewaz/Credit-Card-Default-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Determining the Likelihood of Defaulting Customers Based on Their Personal and Payment Information

Task Overview

The task is to perform exploratory data analysis and build a model to determine if a credit card customer will default their payment, given some basic information about their gender, age, marital status, education, and payment history.

Dataset

The dataset used is the Default of Credit Card Clients dataset found in the UC Irvine Machine Learning Repository. The dataset contains 25 columns. Each of the columns are described in the following table.

Unique customer id Credit Limit Gender Education Level Marital Status Age (years) History of past 6 payments Amount of past 6 bills incurred Amount of past 6 bills paid Default Status (boolean)

More information on the data can be found in the official website. The distribution of the defaulted and non-defaulted clients is shown below.

The dataset is highly skewed towards the negative class. False negatives are expected to be seen in the analysis.

Data Correlation

Some of the features are correlated with other features. The correlation heatmap is shown below. This heatmap informed the direction of subsequent analysis.

Predictive Analysis Summary

Note: The readme of this repository contains a brief outline of the findings. A more detailed report can be found in the notebook.

K-Means

Based on Gender, Education and Marital Status

These 3 categorical features don't tell much about default state.

Based on Payment History

The payment history is a better indicator than gender, education and marital status. However, the non-defaulters are still difficult to find.

Random Forest

The random forest classifier finds the most recent payment state, credit limit and recent bill amount to be the most important indicators that a client will be defaulted.

XGBoost

XGBoost find the recent payments states, marital status and gender to be good identifiers. While these categorical features may not be very meaningful on their own, XGBoost may have found interactions of these features with other features which the random forest's independent trees couldn't find.

Non-technical Summary

Critical Indicators

  • Customers’ recent payment history and credit utilization are key predictors of default risk. These metrics should be closely monitored by credit card issuers.

  • Customers age also have an impact on financial responsibility, with older individuals being more responsible. However, given the small sample of the elderly population in this study, this hypothesis is inconclusive.

About

This work leverages some predictive modeling techniques to extract relevant features that inform the likelihood of customers getting defaulted by credit card companies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors