In this section we clean and prepare the dataset for the model which involves the following steps:
- Download the data from the given link.
- Reformat categorical columns (
status,home,marital,records, andjob) by mapping with appropriate values. - Replace the maximum value of
income,assests, anddebtcolumns with NaNs. - Replace the NaNs in the dataframe with
0(will be shown in the next lesson). - Extract only those rows in the column
statuswho are either ok or default as value. - Split the data in a two-step process which finally leads to the distribution of 60% train, 20% validation, and 20% test sets with random seed to
11. - Prepare target variable
statusby converting it from categorical to binary, where 0 representsokand 1 representsdefault. - Finally delete the target variable from the train/val/test dataframe.
Add notes from the video (PRs are welcome)
|
The notes are written by the community. If you see an error here, please create a PR with a fix. |