Pandas attributes and methods:
- df[col].unique() - returns a list of unique values in the series
- df[col].nunique() - returns the number of unique values in the series
- df.isnull().sum() - returns the number of null values in the dataframe
Matplotlib and seaborn methods:
- %matplotlib inline - assure that plots are displayed in jupyter notebook's cells
- sns.histplot() - show the histogram of a series
Numpy methods:
- np.log1p() - applies log transformation to a variable and adds one to each result.
Long-tail distributions usually confuse the ML models, so the recommendation is to transform the target variable distribution to a normal one whenever possible.
The entire code of this project is available in this jupyter notebook.
|
The notes are written by the community. If you see an error here, please create a PR with a fix. |