Skip to content

Commit 24bfcc9

Browse files
committed
Skip call to .tolist() when creating pd.Index
`np.tile` returns an NDArray and there is no need to convert this to a Python `list` prior to passing it to `pd.MultiIndex`. The interface to Pandas requires the object to be array-like, and one of the first things that the constructor does is coerce the list back to an NDArray. For arrays with large coordinate axes, `to_dataframe()` is extremely slow due to Pandas needing to iterate through a `list` object rather than an array. For an (1000, 500, 20) array -- 10M rows in the cartesian product -- this results in a ~20x speed-up for `xr.Dataset.to_dataframe()` (tested on x86 and Apple Silicon). ```python da = xr.DataArray(np.ones((1000, 500, 20)), name="foo") da.to_dataframe() ``` Closes #10617
1 parent 938e186 commit 24bfcc9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

xarray/core/coordinates.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ def to_index(self, ordered_dims: Sequence[Hashable] | None = None) -> pd.Index:
177177

178178
# compute the cartesian product
179179
code_list += [
180-
np.tile(np.repeat(code, repeat_counts[i]), tile_counts[i]).tolist()
180+
np.tile(np.repeat(code, repeat_counts[i]), tile_counts[i])
181181
for code in codes
182182
]
183183
level_list += levels

0 commit comments

Comments
 (0)