Skip to content

Commit 3ce19ea

Browse files
author
Rehan Durrani
committed
add comments to test
Signed-off-by: Rehan Durrani <[email protected]>
1 parent a547d32 commit 3ce19ea

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

modin/pandas/test/test_groupby.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2035,6 +2035,14 @@ def test_sum_with_level():
20352035

20362036

20372037
def test_reset_index_groupby():
2038+
# Due to `reset_index` deferring the actual reindexing of partitions,
2039+
# when we call groupby after a `reset_index` with a `by` column name
2040+
# that was moved from the index to the columns via `from_labels` the
2041+
# algebra layer incorrectly thinks that the `by` key is duplicated
2042+
# across both the columns and labels, and fails, when it should
2043+
# succeed. We have this test to ensure that that case is correctly
2044+
# handled, and passes. For more details, checkout
2045+
# https://github.com/modin-project/modin/issues/4522.
20382046
frame_data = np.random.randint(97, 198, size=(2**6, 2**4))
20392047
pandas_df = pandas.DataFrame(
20402048
frame_data,
@@ -2043,11 +2051,14 @@ def test_reset_index_groupby():
20432051
),
20442052
).add_prefix("col")
20452053
pandas_df.index.names = [f"index_{i}" for i in range(len(pandas_df.index.names))]
2046-
# Convert every other column to string
2054+
# Convert every even column to string
20472055
for col in pandas_df.iloc[
20482056
:, [i for i in range(len(pandas_df.columns)) if i % 2 == 0]
20492057
]:
20502058
pandas_df[col] = [str(chr(i)) for i in pandas_df[col]]
2059+
# The `pandas_df` contains a multi-index with 3 levels, named `index_0`, `index_1`,
2060+
# and `index_2`, and 16 columns, named `col0` through `col15`. Every even column
2061+
# has dtype `str`, while odd columns have dtype `int64`.
20512062
modin_df = from_pandas(pandas_df)
20522063
eval_general(
20532064
modin_df,

0 commit comments

Comments
 (0)