Skip to content

Commit 4401af1

Browse files
Improve hashing performance
Instead of constructing many Series (as many as input rows), return a list and pass "expand" to apply.
1 parent 9dc65f1 commit 4401af1

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

category_encoders/hashing.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -306,14 +306,14 @@ def hash_fn(x):
306306
else:
307307
hasher.update(bytes(str(val), 'utf-8'))
308308
tmp[int(hasher.hexdigest(), 16) % N] += 1
309-
return pd.Series(tmp, index=new_cols)
309+
return tmp
310310

311311
new_cols = [f'col_{d}' for d in range(N)]
312312

313313
X_cat = X.loc[:, cols]
314314
X_num = X.loc[:, [x for x in X.columns.values if x not in cols]]
315315

316-
X_cat = X_cat.apply(hash_fn, axis=1)
316+
X_cat = X_cat.apply(hash_fn, axis=1, result_type='expand')
317317
X_cat.columns = new_cols
318318

319319
X = pd.concat([X_cat, X_num], axis=1)

0 commit comments

Comments
 (0)