Skip to content

Conversation

sharkipelago
Copy link
Contributor

@sharkipelago sharkipelago commented Aug 25, 2025

I rewrote the method using _mgr.apply but still struggling to figure out why 2 of test_rank.py test cases are failing. Any help or tips are appreciated! They are as follows:

self = <pandas.tests.frame.methods.test_rank.TestRank object at 0x77da84fae490>

    def test_rank2(self):
        df = DataFrame([[1, 3, 2], [1, 2, 3]])
        expected = DataFrame([[1.0, 3.0, 2.0], [1, 2, 3]]) / 3.0
        result = df.rank(1, pct=True)
        tm.assert_frame_equal(result, expected)

        df = DataFrame([[1, 3, 2], [1, 2, 3]])
        expected = df.rank(0) / 2.0
        result = df.rank(0, pct=True)
        tm.assert_frame_equal(result, expected)

        df = DataFrame([["b", "c", "a"], ["a", "c", "b"]])
        expected = DataFrame([[2.0, 3.0, 1.0], [1, 3, 2]])
        result = df.rank(1, numeric_only=False)
>       tm.assert_frame_equal(result, expected)

pandas/tests/frame/methods/test_rank.py:84:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/_libs/testing.pyx:53: in pandas._libs.testing.assert_almost_equal
    cpdef assert_almost_equal(a, b,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   raise_assert_detail(
E   AssertionError: DataFrame.iloc[:, 1] (column name="1") are different
E
E   DataFrame.iloc[:, 1] (column name="1") values are different (100.0 %)
E   [index]: [0, 1]
E   [left]:  [1.5, 1.5]
E   [right]: [3.0, 3.0]
E   At positional index 0, first diff: 1.5 != 3.0

pandas/_libs/testing.pyx:171: AssertionError
self = <pandas.tests.frame.methods.test_rank.TestRank object at 0x77da84faf290>
float_string_frame =                A         B         C         D  foo                   datetime       timedelta
foo_0   0.189053 -0.522...5.169587 1 days 00:00:01
foo_29 -0.967681  1.678419  0.765355  0.045808  bar 2025-08-25 12:57:35.169587 1 days 00:00:01

    def test_rank_mixed_frame(self, float_string_frame):
        float_string_frame["datetime"] = datetime.now()
        float_string_frame["timedelta"] = timedelta(days=1, seconds=1)

        float_string_frame.rank(numeric_only=False)
>       with pytest.raises(TypeError, match="not supported between instances of"):
E       Failed: DID NOT RAISE <class 'TypeError'>

- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
- :meth:`DataFrame.apply` supports using third-party execution engines like the Bodo.ai JIT compiler (:issue:`60668`)
- :meth:`DataFrame.iloc` and :meth:`Series.iloc` now support boolean masks in ``__getitem__`` for more consistent indexing behavior (:issue:`60994`)
- :meth:`DataFrame.rank` now uses internal ``_mgr.apply`` and preserves the dtype for extension arrays (:issue:`52829`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t need to mention mgr, just the user-facing bit

@jbrockmendel
Copy link
Member

In the axis=1 case you need to transpose the whole dataframe, not block-by-block

- :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.SeriesGroupBy.apply`, :meth:`.DataFrameGroupBy.apply` now support ``kurt`` (:issue:`40139`)
- :meth:`DataFrame.apply` supports using third-party execution engines like the Bodo.ai JIT compiler (:issue:`60668`)
- :meth:`DataFrame.iloc` and :meth:`Series.iloc` now support boolean masks in ``__getitem__`` for more consistent indexing behavior (:issue:`60994`)
- :meth:`DataFrame.rank` now preserves the dtype for extension arrays (:issue:`52829`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be dtype_backend, not dtype?

@sharkipelago
Copy link
Contributor Author

Hello! Any other changes I should make to this?

ranks = values._rank(
axis=axis_int,
def ranker(blk_values):
if isinstance(blk_values, ExtensionArray) and blk_values.ndim == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about 2D EAs? in particular DatetimeArray, TimedeltaArray

)
def test_rank_extension_array_dtype(self, method, og_dtype, expected_dtype):
# GH#52829
result = DataFrame([4, 89, 33], dtype=og_dtype).rank()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you be passing method?

@sharkipelago
Copy link
Contributor Author

I removed the raise axis != 0 in arrays/base.py because I thought it might be safe now with the other changes but not sure if there are other things to consider?

See Series.rank.__doc__.
"""
if axis != 0:
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have cases that get here?

Copy link
Contributor Author

@sharkipelago sharkipelago Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the two 2d array tests I just added , test_rank2() raises that Not Implemented Error when that line is there

=============================================== short test summary info ================================================
FAILED pandas/tests/frame/methods/test_rank.py::TestRank::test_rank2 - NotImplementedError
FAILED pandas/tests/frame/methods/test_rank.py::TestRank::test_2d_extension_array_datetime - NotImplementedError
FAILED pandas/tests/frame/methods/test_rank.py::TestRank::test_2d_extension_array_timedelta - NotImplementedError
============================================ 3 failed, 131 passed in 5.58s =============================================

Copy link
Contributor Author

@sharkipelago sharkipelago Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the third "sub test" in test_rank2:

df = DataFrame([["b", "c", "a"], ["a", "c", "b"]])
expected = DataFrame([[2.0, 3.0, 1.0], [1, 3, 2]])
result = df.rank(1, numeric_only=False)

ranks = values._rank(
def ranker(blk_values):
if axis_int == 0:
blk_values = blk_values.T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can avoid transposing at the block level and only transpose at the dataframe level (L9317). then pass axis=blk_values.ndim-1 (which will require updating the EA method)

ranks_obj = self._constructor(ranks, **data._construct_axes_dict())
return ranks_obj.__finalize__(self, method="rank")
if axis_int == 0:
ranks = ranks.T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this transpose should also be avoidable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.rank does not return EA types when original type was an EADtype

2 participants