The task is to perform exploratory data analysis and build a model to determine if a credit card customer will default their payment, given some basic information about their gender, age, marital status, education, and payment history.
The dataset used is the Default of Credit Card Clients dataset found in the UC Irvine Machine Learning Repository. The dataset contains 25 columns. Each of the columns are described in the following table.
| Unique customer id | Credit Limit | Gender | Education Level | Marital Status | Age (years) | History of past 6 payments | Amount of past 6 bills incurred | Amount of past 6 bills paid | Default Status (boolean) |
|---|
More information on the data can be found in the official website. The distribution of the defaulted and non-defaulted clients is shown below.
The dataset is highly skewed towards the negative class. False negatives are expected to be seen in the analysis.
Some of the features are correlated with other features. The correlation heatmap is shown below. This heatmap informed the direction of subsequent analysis.
Note: The readme of this repository contains a brief outline of the findings. A more detailed report can be found in the notebook.
These 3 categorical features don't tell much about default state.
The payment history is a better indicator than gender, education and marital status. However, the non-defaulters are still difficult to find.
The random forest classifier finds the most recent payment state, credit limit and recent bill amount to be the most important indicators that a client will be defaulted.
XGBoost find the recent payments states, marital status and gender to be good identifiers. While these categorical features may not be very meaningful on their own, XGBoost may have found interactions of these features with other features which the random forest's independent trees couldn't find.
-
Customers’ recent payment history and credit utilization are key predictors of default risk. These metrics should be closely monitored by credit card issuers.
-
Customers age also have an impact on financial responsibility, with older individuals being more responsible. However, given the small sample of the elderly population in this study, this hypothesis is inconclusive.

