Skip to content

Commit 235b2e5

Browse files
Optimize slice_slice for faster isel of huge datasets (#4560)
* optimize slice_slice for faster isel of huge datasets * satisfy isort * update whats-new.rst * add benchmark for huge axis small index * add some more comments * fix benchmark * lint * Update doc/whats-new.rst Co-authored-by: Maximilian Roos <[email protected]> * use range instead of computing slice length * compare to 0 Co-authored-by: Maximilian Roos <[email protected]>
1 parent 0b0fb40 commit 235b2e5

File tree

3 files changed

+50
-15
lines changed

3 files changed

+50
-15
lines changed

asv_bench/benchmarks/indexing.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
import os
2+
13
import numpy as np
24
import pandas as pd
35

@@ -138,3 +140,22 @@ def setup(self):
138140

139141
def time_indexing(self):
140142
self.ds.isel(time=self.time_filter)
143+
144+
145+
class HugeAxisSmallSliceIndexing:
146+
# https://github.com/pydata/xarray/pull/4560
147+
def setup(self):
148+
self.filepath = "test_indexing_huge_axis_small_slice.nc"
149+
if not os.path.isfile(self.filepath):
150+
xr.Dataset(
151+
{"a": ("x", np.arange(10_000_000))},
152+
coords={"x": np.arange(10_000_000)},
153+
).to_netcdf(self.filepath, format="NETCDF4")
154+
155+
self.ds = xr.open_dataset(self.filepath)
156+
157+
def time_indexing(self):
158+
self.ds.isel(x=slice(100))
159+
160+
def cleanup(self):
161+
self.ds.close()

doc/whats-new.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ Bug fixes
6464
By `Mathias Hauser <https://github.com/mathause>`_.
6565
- :py:func:`combine_by_coords` now raises an informative error when passing coordinates
6666
with differing calendars (:issue:`4495`). By `Mathias Hauser <https://github.com/mathause>`_.
67+
- Improve performance where reading small slices from huge dimensions was slower than necessary (:pull:`4560`). By `Dion Häfner <https://github.com/dionhaefner>`_.
6768

6869
Documentation
6970
~~~~~~~~~~~~~
@@ -72,7 +73,7 @@ Documentation
7273
(:pull:`4532`);
7374
By `Jimmy Westling <https://github.com/illviljan>`_.
7475
- Raise a more informative error when :py:meth:`DataArray.to_dataframe` is
75-
is called on a scalar, (:issue:`4228`);
76+
is called on a scalar, (:issue:`4228`);
7677
By `Pieter Gijsbers <https://github.com/pgijsbers>`_.
7778
- Fix grammar and typos in the :doc:`contributing` guide (:pull:`4545`).
7879
By `Sahid Velji <https://github.com/sahidvelji>`_.

xarray/core/indexing.py

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -275,25 +275,38 @@ def remap_label_indexers(data_obj, indexers, method=None, tolerance=None):
275275
return pos_indexers, new_indexes
276276

277277

278+
def _normalize_slice(sl, size):
279+
"""Ensure that given slice only contains positive start and stop values
280+
(stop can be -1 for full-size slices with negative steps, e.g. [-10::-1])"""
281+
return slice(*sl.indices(size))
282+
283+
278284
def slice_slice(old_slice, applied_slice, size):
279285
"""Given a slice and the size of the dimension to which it will be applied,
280286
index it with another slice to return a new slice equivalent to applying
281287
the slices sequentially
282288
"""
283-
step = (old_slice.step or 1) * (applied_slice.step or 1)
284-
285-
# For now, use the hack of turning old_slice into an ndarray to reconstruct
286-
# the slice start and stop. This is not entirely ideal, but it is still
287-
# definitely better than leaving the indexer as an array.
288-
items = _expand_slice(old_slice, size)[applied_slice]
289-
if len(items) > 0:
290-
start = items[0]
291-
stop = items[-1] + int(np.sign(step))
292-
if stop < 0:
293-
stop = None
294-
else:
295-
start = 0
296-
stop = 0
289+
old_slice = _normalize_slice(old_slice, size)
290+
291+
size_after_old_slice = len(range(old_slice.start, old_slice.stop, old_slice.step))
292+
if size_after_old_slice == 0:
293+
# nothing left after applying first slice
294+
return slice(0)
295+
296+
applied_slice = _normalize_slice(applied_slice, size_after_old_slice)
297+
298+
start = old_slice.start + applied_slice.start * old_slice.step
299+
if start < 0:
300+
# nothing left after applying second slice
301+
# (can only happen for old_slice.step < 0, e.g. [10::-1], [20:])
302+
return slice(0)
303+
304+
stop = old_slice.start + applied_slice.stop * old_slice.step
305+
if stop < 0:
306+
stop = None
307+
308+
step = old_slice.step * applied_slice.step
309+
297310
return slice(start, stop, step)
298311

299312

0 commit comments

Comments
 (0)