-
Notifications
You must be signed in to change notification settings - Fork 102
Open
Description
Pandas querying is very slow and can be easily replaced with traditional indexing.
Here is the code that cause the bottleneck:
def _eval_rule_perf(self, rule, X, y):
detected_index = list(X.query(rule).index)
Profiling results:
1141.451 _eval_rule_perf skrules/skope_rules.py:614
└─ 1140.967 query pandas/core/frame.py:3316
An example of improved version:
tmp = X
for part_rule in rule.split('and '):
part_rule = part_rule.strip()
sign = '==' if '>' in part_rule else '!='
tmp = tmp[tmp[part_rule.split()[0]] == 1 if sign == '==' else tmp[part_rule.split()[0]] != 1]
Note, this is the code for a binary case, it should be changed to a more generic version.
Profiling results
8.658 <listcomp> skrules/skope_rules.py:357
└─ 8.609 _eval_rule_perf skrules/skope_rules.py:614
└─ 6.739 __getitem__ pandas/core/frame.py:2987
Metadata
Metadata
Assignees
Labels
No labels