test: div by zero returns nan #2636

jrw34 · 2025-06-03T18:03:06Z

What type of PR is this? (check all applicable)

Related issues

Related issue #<issue number>
Closes test: add test for division by zero #2619

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

The floordiv test is raising a 'div by zero' error with the pyarrow backend
This div by zero error is reproducible with the following code and indicates that dividing by zero will not return NaN when a floordiv is applied with the pyarrow backend

>>> import narwhals as nw
>>> import pyarrow as pa
>>> data = {"a":[3], "b":[0]}
>>> test_pa_table = pa.table(data)
>>> test_df = nw.from_native(test_pa_table)
>>> test_df["a"]  / / test_df["b"]

returns
pyarrow.lib.ArrowInvalid: divide by zero

for more information, see https://pre-commit.ci

jrw34 · 2025-06-04T14:39:33Z

@FBruzzesi, given that the division by zero causes the pyarrow backend to error, should I incorporate that behavior into the tests? The original description of the issue stated that all divisions by zero returned NaN or None but that is not true for all backend cases (e.g. pyarrow and pandas[pyarrow]).

FBruzzesi · 2025-06-04T15:01:49Z

Hey @jrw34 here is how I would go about this:

Add a test for expressions as well, in order to test lazy constructors as well

xfail pyarrow backends by adding:

if "pyarrow" in str(constructor_eager):
    request.applymarker(pytest.mark.xfail)

Open an issue to track this inconsistency and eventually fix it - this was the main point for adding this such test i.e. figure out inconsistencies and eventually fix them

for more information, see https://pre-commit.ci

jrw34 · 2025-06-04T19:07:08Z

@FBruzzesi, I avoided using assert_equal_data because for every division by zero there are at least three different outputs (depending upon the numerator [+n, 0, -n]). Due to this constraint I called the .collect() method on the LazyFrames, are there any problems with doing that? One alternative would be to create three different expected LazyFrames instead and use the assert_equal_data for each of those 3 conditions. I am happy to try that instead if you think it will be a more robust testing structure.

I will also document all of the inconsistencies and create a new issue after this PR is closed. Thank you very much for walking me through all of this, it has been a great experience thus far!

FBruzzesi

Hey @jrw34 , sorry for the slow feedback, I have been sick in the last few days.

I left a couple of comments in the code. The new approach seems a bit too convoluted in my opinion but we are not too far from the end goal.

Regarding collect-ing, I left a suggestion for how to use assert_data_equal, which internally takes care of collecting whenever necessary

FBruzzesi · 2025-06-06T07:43:47Z

tests/expr_and_series/arithmetic_test.py

+        truediv_result = df["a"] / df["b"]  # truediv
+        floordiv_result = df["a"] // df["b"]  # floordiv
+        assert truediv_result[0] in div_by_zero_results or truediv_result.is_nan()[0]
+        assert floordiv_result[0] in div_by_zero_results or floordiv_result.is_nan()[0]


You can skip any operation and check if a branch is xfailing

FBruzzesi · 2025-06-06T07:47:13Z

tests/expr_and_series/arithmetic_test.py

+    assert truediv_result[0] in div_by_zero_results or truediv_result.is_nan()[0]
+    assert floordiv_result[0] in div_by_zero_results or floordiv_result.is_nan()[0]


What about having an expected parameter together with left and right so that you can specify the exact expected output for each pair instead of having it generic? Then you can use assert_data_equal with manually constructed dicts:

assert_data_equal({"x": truediv_result}, {"x": [expected]}

FBruzzesi · 2025-06-06T07:47:39Z

tests/expr_and_series/arithmetic_test.py

+
+
+@pytest.mark.parametrize(("left", "right"), [(-2, 0), (0, 0), (2, 0)])
+def test_div_by_zero(


Same ideas applied here, and using assert_data_equal should be even simpler since you can simply pass a frame as first object and that will take care of collecting when necessary

That drastically simplified everything, thanks for all the comments. I just pushed my changes and ended up separating the tests by their respective operation.

FBruzzesi · 2025-06-08T17:03:30Z

Hey @jrw34 there are a few tests failing, from what I can see there are many edge cases and different behaviors, would you be interested in trying to align the behavior with the latest polars version?

I know it's a shift from the original issue and PR, but I feel somehow uneasy to know that (almost) each backend has a different output

jrw34 · 2025-06-08T22:35:35Z

@FBruzzesi, the tests were all passing on my machine before I pushed this so I thought I was in the clear. I will resolve all of the failures before the next push.

For the polars behavior, I will add that in.

I also share the same sense of uneasiness pertaining to the discrepancies in backend behavior. It is interesting that floor division is causing all of the errors but not true division. I am currently documenting all of the differences for each backend and will include that in a new issue.

I imagine down the road it would be a nice feature to standardize the behavior of division by zero for all backends once they are ingested into Narwhals (everything either returns None or nan). Once this testing business gets settled I would be interesting in claiming that feature addition if it is something worth doing.

jrw34 · 2025-06-09T15:57:35Z

@FBruzzesi, I included the polars specific behavior in the floordiv tests. Should I rebase down to a single commit before this gets merged?

jrw34 · 2025-06-10T16:02:58Z

@FBruzzesi, how should I go about running the tests for all versions and platforms? When I run the test locally it passes but fails on many of the windows tests when it is on github. The failures are coming from the polars floordiv.

To provide a sanity check, I am attaching what it looks like when I run it locally below.

(narwhals-dev) *****@*****:~/narwhals-dev$ pytest tests/expr_and_series/arithmetic_test.py
================================================= test session starts ==================================================
platform linux -- Python 3.12.11, pytest-8.4.0, pluggy-1.6.0
Using --randomly-seed=1637843562
rootdir: /*****/*****/narwhals-dev
configfile: pyproject.toml
plugins: cov-6.1.1, randomly-3.16.0, env-1.1.5, hypothesis-6.135.4
collected 336 items

tests/expr_and_series/arithmetic_test.py .....x.........x........x..x...x.........................x......x...... [ 21%]
............x..x............x.x........................x...............................x.......s.............s.. [ 54%]
x........s...............x.....s..s..............s.....................................................x....x... [ 87%]
....xx.............xx...xx.............x.                                                                        [100%]

====================================== 306 passed, 6 skipped, 24 xfailed in 3.36s ======================================

FBruzzesi · 2025-06-10T20:46:02Z

@jrw34 We used to have nox setup, but it has gone a bit out of sync (an attempt to get it back on track is #2358)
I would suggest to tweak that to run multiple python and package versions locally. Sadly this feature seems to be very version dependent, so kudos for starting with it

jrw34 · 2025-06-10T20:57:44Z

@FBruzzesi, I will delve into running multiple versions/packages. It looks like skipping some pandas versions dwindled down the number of failures, so we are on the right track.

This has been an unexpected adventure after the very welcoming issue title, but it has definitely enhanced my appreciation for floor division by zero.

Also, would it be acceptable to add a conditional for pandas[nullable], because after reviewing the failures it appears that this constructor is likely the last edge case without coverage.
Example:

elif "pandas_nullable" in str(constructor):
     assert_equal_data(floordiv_result, {"a": [0]})

I feel as thought it will add even more bulk to test but it would help rule out more of the edge cases, let me know your thoughts!

FBruzzesi · 2025-06-10T21:37:15Z

This has been an unexpected adventure after the very welcoming issue title, but it has definitely enhanced my appreciation for floor division by zero.

I can 100% related - I have been there sooo many times 😂

would it be acceptable to add a conditional for pandas[nullable], because after reviewing the failures it appears that this constructor is likely the last edge case without coverage.

Sure! I think it's fine to catch all the cases, so that we know what we need to address afterwards

Also I forgot to address the following:

Should I rebase down to a single commit before this gets merged?

Don't worry about it, we will squash and merge as one commit at the end 👌

FBruzzesi · 2025-06-11T16:30:02Z

tests/expr_and_series/arithmetic_test.py

+    elif all(x in str(constructor_eager) for x in ["pandas", "nullable"]):
+        floordiv_result = df["a"] // df["b"]
+        assert_equal_data({"a": floordiv_result}, {"a": [0]})


@MarcoGorelli this might be worth reporting upstream as well. WDYT? Is this expected?
Repro:

pd.Series([-1, 0, 1]).convert_dtypes("numpy_nullable")//0 Out[4]: 0 0 1 0 2 0 dtype: Int64

@FBruzzesi upstream is aware: pandas-dev/pandas#30188 on this one. This corner of NaN producing operations on "numpy_nullable" backed values likely won't be resolved (pandas may be at yet another crossroads on this) so I think we should just xfail for this specific test.

Pinging @MarcoGorelli to verify.

@camriddell, @FBruzzesi, and @MarcoGorelli. Thank you for all of the review, I am happy to change this case to xfail if needed so just let me know,

On second thought it let's keep the current code as it captures the behaviors of each of these backends, so if they change in the future we'll be aware of this shift.

Would it be possible to also add another variant of these tests that works with floating point values as the inputs for the numerator/denominator? We should see much more consistent results returned (e.g. see [inf, NaN, -inf]) across each of the backends.

Some thoughts on the oddities you observed. Feel free to ignore, I wanted to capture this in case we ever need to revisit this decision.

It seems that there are a few camps on the set of results that one would obtain from floor dividing two integer arrays (where the denom is 0):

return i64 of nulls (Polars): if you divide floats by 0.0, you end with values of inf, -inf, or NaN which do not exist in integer dtypes, therefore the result is (shortcutted?) i64 with Null values.

return i64 all 0s (pandas[nullable], numpy). Tough to reason about, numpy issues a RuntimeWarning but pandas[nullable] does not.

return f64 output (pandas). Perhaps more mathematically sound, but some may be surprised at seeing a floats returned when floor-dividing two integers.

raise (PyArrow pandas[pyarrow]) this is an opt-in behavior and we followed pandas

Interesting with the floating point differences, I will add that in. Given that there are now going to be 8 different test functions, should I create a test_divsion_by_zero.py file to house the all these tests instead of bulking up arithmetic_tests.py even more?

MarcoGorelli · 2025-06-20T16:54:53Z

thanks all for discussions

i haven't read everything yet, just a quick comment:

for truediv, i think we want to check that dividing by zero consistently returns inf/nan/-inf
for floordiv, it looks like it's the wild west out there. i don't know who's right, but i'm honestly less bothered about this one. i don't advise bending over backwards to deal with it. we could just make a page in the docs about it in concepts explaining how backends differ

happy to keep this PR focused to the truediv case and split the more contentious floordiv into a separate one if that helps get this in

jrw34 · 2025-06-20T17:08:28Z

@MarcoGorelli, the treudiv behavior is consistent across all backends for float and int so I think it is reasonable to make a new PR to address all of the differences for the floordiv (as soon as I added floats and ints the floordiv tests all failed). I do think it would be beneficial to keep the floordiv by zero tests that are already there because that will make the PR more clear to follow when it is getting worked on.

for more information, see https://pre-commit.ci

FBruzzesi

Thanks @jrw34 for the contribution (and your patience) 🙌🏼

I just created #2748 as a follow up issue to track the need to align behavior for floordiv operation across backends

MarcoGorelli · 2025-06-28T17:10:38Z

this is failing for pyspark, see #2753

jrw34 and others added 3 commits June 2, 2025 14:54

Add tests for div by zero

fa391dc

Use constructor eager for to cover backend testing

2a6f16d

[pre-commit.ci] auto fixes from pre-commit.com hooks

6e6ee79

for more information, see https://pre-commit.ci

jrw34 changed the title ~~Add test for div by zero~~ [test]: div by zero returns nan Jun 3, 2025

jrw34 changed the title ~~[test]: div by zero returns nan~~ test: div by zero returns nan Jun 3, 2025

dangotbanned added the tests label Jun 4, 2025

jrw34 and others added 4 commits June 4, 2025 12:36

add lazy div by zero testing

bbb6d8c

add lazy div by zero testing

191f901

add collect call to lazy constructors

cc9c2fa

[pre-commit.ci] auto fixes from pre-commit.com hooks

57afe7e

for more information, see https://pre-commit.ci

FBruzzesi reviewed Jun 6, 2025

View reviewed changes

Created individual tests for floordiv and truediv by zero

afde36b

jrw34 and others added 2 commits June 9, 2025 09:12

Merge branch 'main' into add-test-for-div-by-zero

81cebec

Add polars backend specific division by zero return value

a87b195

jrw34 marked this pull request as ready for review June 9, 2025 15:57

Merge branch 'main' into add-test-for-div-by-zero

8f7087c

Exclude floordiv tests for pandas versions < 2.0

901921f

Added pandas[nullable] conditional for floordiv by zero

84d88be

FBruzzesi reviewed Jun 11, 2025

View reviewed changes

jrw34 and others added 5 commits June 20, 2025 11:13

Add float div by zero test for truediv, make new test file

2a2883b

Merge branch 'main' into add-test-for-div-by-zero

260265d

[pre-commit.ci] auto fixes from pre-commit.com hooks

830a9fc

for more information, see https://pre-commit.ci

Ruff fixes

1ac2297

Merge branch 'main' into add-test-for-div-by-zero

964122f

jrw34 requested review from FBruzzesi and camriddell June 27, 2025 17:59

Merge branch 'main' into add-test-for-div-by-zero

33a00c5

FBruzzesi mentioned this pull request Jun 28, 2025

[Enh]: Align floordiv behavior across different backends #2748

Open

FBruzzesi approved these changes Jun 28, 2025

View reviewed changes

FBruzzesi merged commit 3aba9bb into narwhals-dev:main Jun 28, 2025
31 of 32 checks passed

MarcoGorelli mentioned this pull request Jun 28, 2025

ci: division by zero test failing for pyspark #2753

Closed

		assert truediv_result[0] in div_by_zero_results or truediv_result.is_nan()[0]
		assert floordiv_result[0] in div_by_zero_results or floordiv_result.is_nan()[0]



		@pytest.mark.parametrize(("left", "right"), [(-2, 0), (0, 0), (2, 0)])
		def test_div_by_zero(

test: div by zero returns nan #2636

test: div by zero returns nan #2636

Uh oh!

Conversation

jrw34 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

jrw34 commented Jun 4, 2025

Uh oh!

FBruzzesi commented Jun 4, 2025

Uh oh!

jrw34 commented Jun 4, 2025

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

jrw34 Jun 8, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrw34 commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrw34 commented Jun 9, 2025

Uh oh!

jrw34 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBruzzesi commented Jun 10, 2025

Uh oh!

jrw34 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBruzzesi commented Jun 10, 2025

Uh oh!

FBruzzesi Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camriddell Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

jrw34 Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

camriddell Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

jrw34 Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Jun 20, 2025

Uh oh!

jrw34 commented Jun 20, 2025

Uh oh!

FBruzzesi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarcoGorelli commented Jun 28, 2025

Uh oh!

Reviewers

Assignees

jrw34 commented Jun 3, 2025 •

edited

Loading

FBruzzesi Jun 6, 2025 •

edited

Loading

FBruzzesi commented Jun 8, 2025 •

edited

Loading

jrw34 commented Jun 8, 2025 •

edited

Loading

jrw34 commented Jun 10, 2025 •

edited

Loading

jrw34 commented Jun 10, 2025 •

edited

Loading

FBruzzesi Jun 11, 2025 •

edited

Loading

jrw34 Jun 19, 2025 •

edited

Loading