-
Notifications
You must be signed in to change notification settings - Fork 13
[BSE-4737] Implement interface for using bodo as engine in Pandas UDFs. #410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
class BodoExecutionEngine(BaseExecutionEngine): | ||
@staticmethod | ||
def map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not implemented yet in Pandas so there is no way to test. Leaving as a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a PR adding engine
to Series.map
shortly. I have the implementation already, but map and apply tests are structured very differently, and I need to see how to implement the common fixtures without making the mess bigger or refactoring all the tests.
…ai/Bodo into scott/pandas_bodo_executor
for more information, see https://pre-commit.ci
…ai/Bodo into scott/pandas_bodo_executor
Codecov ReportAttention: Patch coverage is
❌ Your project check has failed because the head coverage (66.52%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #410 +/- ##
==========================================
- Coverage 66.59% 66.52% -0.07%
==========================================
Files 176 176
Lines 63177 63288 +111
Branches 8835 8856 +21
==========================================
+ Hits 42072 42103 +31
- Misses 18483 18561 +78
- Partials 2622 2624 +2 |
pixi.toml
Outdated
@@ -70,7 +70,6 @@ pip = "*" | |||
# Core Python Deps | |||
numba = "==0.61.0" | |||
numpy = ">=1.24,<2.2" | |||
pandas = ">=2.2,<2.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing locally, will restore before merging.
for more information, see https://pre-commit.ci
…ai/Bodo into scott/pandas_bodo_executor
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
args_str += "," | ||
else: | ||
# Add dummy value for args for spawn mode compatibility. | ||
# TODO: fix in spawn mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a PR to fix this here: #414
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scott-routledge2 ! Looks good to me.
pyproject.toml
Outdated
"pandas>=2.2,<2.3", | ||
"pandas>=2.2,<3.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert?
bodo/pandas_compat.py
Outdated
def f(func_text, glbls, loc_vars, __name__): | ||
bodo_exec(func_text, glbls, loc_vars, __name__) | ||
|
||
spawner.submit_func_to_workers(f, [], apply_func_text, glbls, {}, __name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong indentation looks like. Should be inside if spawn_mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I just realized this codepath is taken even when engine=bodo.jit(spawn=False, distributed=False)
. Would there be a way to also case on the decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will exec the func text on the workers even when spawn=False
. Discussed offline it would be better to just avoid func_text altogether.
|
||
def test_udf_cache(): | ||
"""Tests that we can call the same UDF multiple times with cache flag on | ||
without any errors. TODO: check cache.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you verified caching manually at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
…s. (#410) Subclass BaseExecutionEngine and implement BodoExecutionEngine, add as an attribute to bodo.jit / return_wrapped_fn. As a followup we can also implement kwargs (but not key word only arguments, basically do something similar to numba where they fold keyword arguments into args using the function signature.) Also need to do Series.map support, however that cannot be tested right now because it hasn't been implemented in Pandas. (cherry picked from commit 30b2bc1)
Changes included in this PR
Subclass BaseExecutionEngine and implement BodoExecutionEngine, add as an attribute to
bodo.jit
/return_wrapped_fn
.See pandas-dev/pandas#61032 for context.
As a followup we can also implement kwargs (but not key word only arguments, basically do something similar to numba where they fold keyword arguments into
args
using the function signature.)Also need to do
Series.map
support, however that cannot be tested right now because it hasn't been implemented in Pandas.Testing strategy
In order to test, I needed to build Pandas from source. Here are the steps I took to test:
pixi remove pandas
. Also need to update pyproject.toml. Note pyproject.toml and pixi.lock/toml will be restored before merging.pixi run clean
cd pandas; pip install .
pixi run build-bodo
Ran PR CI for changes to jit decorator.
Run tests locally in spawn mode.
User facing changes
Once Pandas 3 is released, users can use Bodo to accelerate their UDFs in Pandas.
e.g.
Checklist
[run CI]
in your commit message.