Skip to content

Performance optimization #52

@iharshulhan

Description

@iharshulhan

Pandas querying is very slow and can be easily replaced with traditional indexing.
Here is the code that cause the bottleneck:

def _eval_rule_perf(self, rule, X, y):
      detected_index = list(X.query(rule).index)

Profiling results:

1141.451 _eval_rule_perf  skrules/skope_rules.py:614
         └─ 1140.967 query  pandas/core/frame.py:3316

An example of improved version:

tmp = X
for part_rule in rule.split('and '):
    part_rule = part_rule.strip()
    sign = '==' if '>' in part_rule else '!='
    tmp = tmp[tmp[part_rule.split()[0]] == 1 if sign == '==' else tmp[part_rule.split()[0]] != 1]

Note, this is the code for a binary case, it should be changed to a more generic version.

Profiling results

 8.658 <listcomp>  skrules/skope_rules.py:357
         └─ 8.609 _eval_rule_perf  skrules/skope_rules.py:614
            └─ 6.739 __getitem__  pandas/core/frame.py:2987

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions