Skip to content

Conversation

afonso-antunes
Copy link

@afonso-antunes afonso-antunes commented Apr 2, 2025

Fix Summary:

Previously, the _make_concat_multiindex method could silently downgrade extension dtypes (e.g., to object) when creating levels. This PR ensures that the _concat_indexes helper uses the correct dtype-aware construction (array(..., dtype=...)) to preserve the original dtype of the first index.

Test added:

Added a test in pandas/tests/frame/methods/test_concat_arrow_index.py that covers the preservation of extension dtypes when using pd.concat with keys= that triggers MultiIndex creation.

The test creates two DataFrames with timestamp[pyarrow] indices, then concatenates them with pd.concat(..., keys=...) and asserts that:

  • The resulting index is a MultiIndex
  • The second level (levels[1]) retains the ArrowDtype('timestamp[us][pyarrow]') instead of being downgraded to object.

This ensures the dtype preservation fix is validated and regressed against.

@afonso-antunes
Copy link
Author

afonso-antunes commented Apr 2, 2025

Note on test failures

Some tests are failing because they expect the old behavior where pd.concat(..., keys=...) would return an Index of tuples with dtype=object.

This PR intentionally changes that behavior to preserve the dtype of the original index (e.g., ArrowDtype) and produce a proper MultiIndex with names and levels — which is more consistent and solves the issue.

Errors such as:

  • AttributeError: 'Index' object has no attribute 'levels'
  • AssertionError due to mismatched Index vs MultiIndex

...are a direct result of this behavior change.
These test failures are expected and reflect outdated assumptions.
If needed, I'm happy to follow up with updates to the relevant tests to align with the new behavior.

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Index[timestamp[pyarrow]].union with itself return object type

2 participants