-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
Description
What is your issue?
I have a workload that is calling ds.to_dataframe()
which takes several seconds because the output DataFrame has 10M+ rows. On my machine, the vast majority of the time (>99%) spent in xr.Dataset.to_dataframe()
is constructing the pd.MultiIndex
and within that, >80% of the time is spent calling tolist()
and forcing the constructor of pd.MultiIndex to iterate through a list
rather than an ndarray
.
On line L180:
https://github.com/pydata/xarray/blob/main/xarray/core/coordinates.py#L180
is there a reason to call .tolist()
rather than just keeping the object as an ndarray
? Removing .tolist()
results in a significant performance improvement for me.