PERF-#7657: Fork pandas eval and query implementation to improve performance. by sfc-gh-mvashishtha · Pull Request #7658 · modin-project/modin

sfc-gh-mvashishtha · 2025-09-03T00:35:46Z

Currently we use the pandas eval() and query() implementations almost entirely as is. That's not good practice in general, and #7657 shows a performance issue that applies to Modin but not pandas in the current implementation.

In this commit, fork the query() and eval() implementation and eliminate the .values call that causes numpy materialization.

The code here is mostly copied from pandas/pandas/core/computation, except:

Replace the .values call that causes the performance issue in PERF: Fork pandas eval() and query() implementation to reduce to_numpy() calls #7657.
Delete nearly all the numexpr code, since we default to pandas for the numexpr engine.
Clean up code in a few places to get through linter and CodeQL.
Delete the pytables code. I don't think that pandas uses this code currently.

Resolves #7657

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

github-advanced-security

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

modin/core/computation/engines.py

modin/core/computation/expr.py

modin/core/computation/scope.py

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

sfc-gh-joshi

One question regarding the license for forked code. Also, how much of the pandas code did you need to change besides pointing import paths to modin equivalents of pandas modules? If it was very little then we may want to make clearer from folder naming that this is essentially vendored pandas code.

modin/core/computation/align.py

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

sfc-gh-mvashishtha

Also, how much of the pandas code did you need to change besides pointing import paths to modin equivalents of pandas modules? If it was very little then we may want to make clearer from folder naming that this is essentially vendored pandas code.

@sfc-gh-joshi I did make very few changes, but I don't think it's that important to point out with the directory structure that some code has been vendored from pandas. If we were to vendor an entire package I think it would make sense to put it in a new vendored directory. We are also putting some modified code in dataframe.py and other modified code in modin/core/computation/.

modin/core/computation/align.py

sfc-gh-joshi · 2025-09-04T21:23:30Z

I did make very few changes, but I don't think it's that important to point out with the directory structure that some code has been vendored from pandas.

In that case, as long as all relevant files have something we can grep for if we want to pull in upstream changes it should be fine.

sfc-gh-mvashishtha · 2025-09-04T23:38:47Z

@devin-petersohn could you PTAL at the licensing changes? Thanks!

LICENSE

Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>

PERF-modin-project#7657: Fork pandas eval() implementation.

f59fdae

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

github-advanced-security bot found potential problems Sep 3, 2025

View reviewed changes

sfc-gh-mvashishtha added 2 commits September 2, 2025 21:18

Fix imports

8111cb6

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

Address some comments from CodeQL

c64ee7c

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

github-advanced-security bot found potential problems Sep 3, 2025

View reviewed changes

modin/core/computation/engines.py Dismissed Show dismissed Hide dismissed

modin/core/computation/expr.py Dismissed Show dismissed Hide dismissed

modin/core/computation/scope.py Dismissed Show dismissed Hide dismissed

sfc-gh-mvashishtha changed the title ~~PERF-#7657: Fork pandas eval() implementation.~~ PERF-#7657: Fork pandas eval and query implementation to improve performance. Sep 3, 2025

Add license headers

b3c9cef

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

sfc-gh-mvashishtha marked this pull request as ready for review September 3, 2025 22:19

sfc-gh-mvashishtha requested review from a team, RehanSD, YarShev, anmyachev, dchigarev, devin-petersohn, mvashishtha and vnlitvinov as code owners September 3, 2025 22:19

sfc-gh-joshi reviewed Sep 3, 2025

View reviewed changes

modin/core/computation/align.py Show resolved Hide resolved

Add license and fix the dtype issue properly

f90660c

Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>

sfc-gh-mvashishtha commented Sep 4, 2025

View reviewed changes

modin/core/computation/align.py Show resolved Hide resolved

sfc-gh-joshi approved these changes Sep 4, 2025

View reviewed changes

sfc-gh-dpetersohn reviewed Sep 5, 2025

View reviewed changes

LICENSE Outdated Show resolved Hide resolved

LICENSE Outdated Show resolved Hide resolved

Apply suggestions from code review

25e091a

Co-authored-by: Devin Petersohn <devin.petersohn@snowflake.com>

sfc-gh-mvashishtha merged commit 5ed69b5 into modin-project:main Sep 8, 2025
40 checks passed

sfc-gh-joshi mentioned this pull request Sep 9, 2025

BUG: modin.pandas.eval() does not work at all #7656

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

PERF-#7657: Fork pandas eval and query implementation to improve performance.#7658

PERF-#7657: Fork pandas eval and query implementation to improve performance.#7658
sfc-gh-mvashishtha merged 6 commits intomodin-project:mainfrom
sfc-gh-mvashishtha:7657/perf/fork-eval-and-query-implementation

sfc-gh-mvashishtha commented Sep 3, 2025 •

edited

Loading

Uh oh!

github-advanced-security bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfc-gh-joshi left a comment

Uh oh!

Uh oh!

sfc-gh-mvashishtha left a comment

Uh oh!

Uh oh!

sfc-gh-joshi commented Sep 4, 2025

Uh oh!

sfc-gh-mvashishtha commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

sfc-gh-mvashishtha commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfc-gh-joshi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sfc-gh-mvashishtha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sfc-gh-joshi commented Sep 4, 2025

Uh oh!

sfc-gh-mvashishtha commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sfc-gh-mvashishtha commented Sep 3, 2025 •

edited

Loading