Skip call to `.tolist()` when creating pd.Index #10619

y4n9squared · 2025-08-08T14:13:23Z

np.tile returns an NDArray and there is no need to convert this to a Python list prior to passing it to pd.MultiIndex. The interface to Pandas requires the object to be array-like, and one of the first things that the constructor does is coerce the list back to an NDArray.

For arrays with large coordinate axes, to_dataframe() is extremely slow due to Pandas needing to iterate through a list object rather than an array.

For an (1000, 500, 20) array -- 10M rows in the cartesian product -- this results in a ~20x speed-up for xr.Dataset.to_dataframe() (tested on x86 and Apple Silicon).

da = xr.DataArray(np.ones((1000, 500, 20)), name="foo")
da.to_dataframe()

Closes Why call tolist() when constructing pandas MultiIndex? #10617

welcome · 2025-08-08T14:13:26Z

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

`np.tile` returns an NDArray and there is no need to convert this to a Python `list` prior to passing it to `pd.MultiIndex`. The interface to Pandas requires the object to be array-like, and one of the first things that the constructor does is coerce the list back to an NDArray. For arrays with large coordinate axes, `to_dataframe()` is extremely slow due to Pandas needing to iterate through a `list` object rather than an array. For an (1000, 500, 20) array -- 10M rows in the cartesian product -- this results in a ~20x speed-up for `xr.Dataset.to_dataframe()` (tested on x86 and Apple Silicon). ```python da = xr.DataArray(np.ones((1000, 500, 20)), name="foo") da.to_dataframe() ``` Closes pydata#10617

dcherian

Nice, thank you. Welcome to Xarray!

welcome · 2025-08-13T01:22:48Z

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again!

y4n9squared force-pushed the faster-to-dataframe branch from 81a836e to 24bfcc9 Compare August 8, 2025 14:14

Illviljan added the run-benchmark Run the ASV benchmark workflow label Aug 8, 2025

TomNicholas added the topic-performance label Aug 11, 2025

dcherian approved these changes Aug 13, 2025

View reviewed changes

dcherian merged commit 3c9217e into pydata:main Aug 13, 2025
53 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Skip call to `.tolist()` when creating pd.Index #10619

Skip call to `.tolist()` when creating pd.Index #10619

Uh oh!

y4n9squared commented Aug 8, 2025

Uh oh!

welcome bot commented Aug 8, 2025

Uh oh!

dcherian left a comment

Uh oh!

Uh oh!

welcome bot commented Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

Skip call to .tolist() when creating pd.Index #10619

Skip call to .tolist() when creating pd.Index #10619

Uh oh!

Conversation

y4n9squared commented Aug 8, 2025

Uh oh!

welcome bot commented Aug 8, 2025

Uh oh!

dcherian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

welcome bot commented Aug 13, 2025

Uh oh!

Uh oh!

Skip call to `.tolist()` when creating pd.Index #10619

Skip call to `.tolist()` when creating pd.Index #10619