-
Notifications
You must be signed in to change notification settings - Fork 404
Description
Memory increase of WOEEncoder for category_encoders version >=2.0.0
Hi, I noticed another memory issue with WOEEncoder. I have submitted the same bug before in #335, the difference between two bugs is the different encoder methods used and different datasets. In order to distinguish between the two encoder APIs, I resubmitted a new bug report.
Expected Behavior
Similar memory usage
Actual Behavior
According to the experiment results, when the category_encoders version is higher than 2.0.0, weight_enc.fit(train[weight_encode], train['target']) memory usage increase from 58MB to 206MB.
| Memory(MB) | Version |
|---|---|
| 209 | 2.3.0 |
| 209 | 2.2.2 |
| 209 | 2.1.0 |
| 209 | 2.0.0 |
| 58 | 1.3.0 |
Steps to Reproduce the Problem
Step 1: Download the dataset
Step 2: install category_encoders
pip install category_encoders == #version#Step 3: change category_encoders version and save the memory usage
import numpy as np
import pandas as pd
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
columns = [x for x in train.columns if x != 'target']
object_col_label = ['bin_0','bin_1','bin_2','bin_3','bin_4']
one_hot_encode = ['nom_0', 'nom_1', 'nom_2', 'nom_3', 'nom_4']
target_encode = ['nom_5', 'nom_6', 'nom_7', 'nom_8', 'nom_9']
weight_encode = target_encode + ['ord_4', 'ord_5' ,'ord_3'] + one_hot_encode + object_col_label
import category_encoders as ce
weight_enc = ce.woe.WOEEncoder(cols=weight_encode)
import tracemalloc
tracemalloc.start()
weight_enc.fit(train[weight_encode], train['target'])
current3, peak3 = tracemalloc.get_traced_memory()
print("Get_dummies memory usage is {",current3 /1024/1024,"}MB; Peak memory was :{",peak3 / 1024/1024,"}MB")Specifications
Version: 2.3.0, 2.2.2, 2.1.0, 2.0.0, 1.3.0
Platform: ubuntu 16.4
OS : Ubuntu
CPU : Intel(R) Core(TM) i9-9900K CPU
GPU : TITAN V