Skip to content

Commit dc9fffd

Browse files
samukwekusammychocoZeroto521
authored
[ENH] Numba support for single non-equi joins (#1151)
* add numba option for single join * add comments to functions; add annotations * clean up utils * Updates based on feedback * add Literal annotation * update _convert_to_numpy_array based on feedback * remove enums for keep and how, based on feedback; optimize numba functions based on feedback * Update janitor/functions/conditional_join.py Co-authored-by: Zero <[email protected]> * Update janitor/functions/conditional_join.py Co-authored-by: Zero <[email protected]> * Update janitor/functions/conditional_join.py Co-authored-by: Zero <[email protected]> * changelog * changelog * add comments * changelog Co-authored-by: sammychoco <[email protected]> Co-authored-by: Zero <[email protected]>
1 parent 78d4297 commit dc9fffd

File tree

5 files changed

+985
-370
lines changed

5 files changed

+985
-370
lines changed

CHANGELOG.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33
## [Unreleased]
44

55
- [DOC] Updated developer guide docs.
6-
- [ENH] Allow column selection/renaming within conditional_join. #1102. Also allow first or last match. #1020 @samukweku.
6+
- [ENH] Allow column selection/renaming within conditional_join. Issue #1102. Also allow first or last match. Issue #1020 @samukweku.
77
- [ENH] New decorator `deprecated_kwargs` for breaking API. #1103 @Zeroto521
8-
- [ENH] Extend select_columns to support non-string columns. Also allow selection on MultiIndex columns via level parameter. #1105 @samukweku
9-
- [ENH] Performance improvement for groupby_topk. #1093 @samukweku
8+
- [ENH] Extend select_columns to support non-string columns. Also allow selection on MultiIndex columns via level parameter. Issue #1105 @samukweku
9+
- [ENH] Performance improvement for groupby_topk. Issue #1093 @samukweku
1010
- [ENH] `min_max_scale` drop `old_min` and `old_max` to fit sklearn's method API. Issue #1068 @Zeroto521
1111
- [ENH] Add `jointly` option for `min_max_scale` support to transform each column values or entire values. Default transform each column, similar behavior to `sklearn.preprocessing.MinMaxScaler`. (Issue #1067, PR #1112, PR #1123) @Zeroto521
1212
- [INF] Require pyspark minimal version is v3.2.0 to cut duplicates codes. Issue #1110 @Zeroto521
@@ -19,6 +19,7 @@
1919
- [INF] Set independent environment for building documentation. PR #1141 @Zeroto521
2020
- [DOC] Add local documentation preview via github action artifact. PR #1149 @Zeroto521
2121
- [ENH] Enable `encode_categorical` handle 2 (or more ) dimensions array. PR #1153 @Zeroto521
22+
- [ENH] Faster computation for a single non-equi join, with a numba engine. Issue #1102 @samukweku
2223

2324
## [v0.23.1] - 2022-05-03
2425

0 commit comments

Comments
 (0)