Skip to content

Commit d237a71

Browse files
jsignellkmuehlbauerkeewisdcherianIllviljan
authored
New defaults for concat, merge, combine_* (#10062)
Co-authored-by: Kai Mühlbauer <[email protected]> Co-authored-by: Justus Magin <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Illviljan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <[email protected]>
1 parent 0751b72 commit d237a71

25 files changed

+1442
-372
lines changed

doc/user-guide/combining.rst

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -64,15 +64,18 @@ dimension:
6464

6565
.. jupyter-execute::
6666

67-
xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim")
67+
da0 = da.isel(x=0, drop=True)
68+
da1 = da.isel(x=1, drop=True)
69+
70+
xr.concat([da0, da1], "new_dim")
6871

6972
The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or
7073
:py:class:`~xarray.DataArray` object as well as a string, in which case it is
7174
used to label the values along the new dimension:
7275

7376
.. jupyter-execute::
7477

75-
xr.concat([da.isel(x=0), da.isel(x=1)], pd.Index([-90, -100], name="new_dim"))
78+
xr.concat([da0, da1], pd.Index([-90, -100], name="new_dim"))
7679

7780
Of course, ``concat`` also works on ``Dataset`` objects:
7881

@@ -87,6 +90,12 @@ between datasets. With the default parameters, xarray will load some coordinate
8790
variables into memory to compare them between datasets. This may be prohibitively
8891
expensive if you are manipulating your dataset lazily using :ref:`dask`.
8992

93+
.. note::
94+
95+
In a future version of xarray the default values for many of these options
96+
will change. You can opt into the new default values early using
97+
``xr.set_options(use_new_combine_kwarg_defaults=True)``.
98+
9099
.. _merge:
91100

92101
Merge
@@ -109,10 +118,18 @@ If you merge another dataset (or a dictionary including data array objects), by
109118
default the resulting dataset will be aligned on the **union** of all index
110119
coordinates:
111120

121+
.. note::
122+
123+
In a future version of xarray the default value for ``join`` and ``compat``
124+
will change. This change will mean that xarray will no longer attempt
125+
to align the indices of the merged dataset. You can opt into the new default
126+
values early using ``xr.set_options(use_new_combine_kwarg_defaults=True)``.
127+
Or explicitly set ``join='outer'`` to preserve old behavior.
128+
112129
.. jupyter-execute::
113130

114131
other = xr.Dataset({"bar": ("x", [1, 2, 3, 4]), "x": list("abcd")})
115-
xr.merge([ds, other])
132+
xr.merge([ds, other], join="outer")
116133

117134
This ensures that ``merge`` is non-destructive. ``xarray.MergeError`` is raised
118135
if you attempt to merge two variables with the same name but different values:
@@ -123,6 +140,16 @@ if you attempt to merge two variables with the same name but different values:
123140
xr.merge([ds, ds + 1])
124141

125142

143+
.. note::
144+
145+
In a future version of xarray the default value for ``compat`` will change
146+
from ``compat='no_conflicts'`` to ``compat='override'``. In this scenario
147+
the values in the first object override all the values in other objects.
148+
149+
.. jupyter-execute::
150+
151+
xr.merge([ds, ds + 1], compat="override")
152+
126153
The same non-destructive merging between ``DataArray`` index coordinates is
127154
used in the :py:class:`~xarray.Dataset` constructor:
128155

@@ -156,6 +183,11 @@ For datasets, ``ds0.combine_first(ds1)`` works similarly to
156183
there are conflicting values in variables to be merged, whereas
157184
``.combine_first`` defaults to the calling object's values.
158185

186+
.. note::
187+
188+
In a future version of xarray the default options for ``xr.merge`` will change
189+
such that the behavior matches ``combine_first``.
190+
159191
.. _update:
160192

161193
Update
@@ -248,7 +280,7 @@ coordinates as long as any non-missing values agree or are disjoint:
248280

249281
ds1 = xr.Dataset({"a": ("x", [10, 20, 30, np.nan])}, {"x": [1, 2, 3, 4]})
250282
ds2 = xr.Dataset({"a": ("x", [np.nan, 30, 40, 50])}, {"x": [2, 3, 4, 5]})
251-
xr.merge([ds1, ds2], compat="no_conflicts")
283+
xr.merge([ds1, ds2], join="outer", compat="no_conflicts")
252284

253285
Note that due to the underlying representation of missing values as floating
254286
point numbers (``NaN``), variable data type is not always preserved when merging
@@ -311,12 +343,11 @@ coordinates, not on their position in the list passed to ``combine_by_coords``.
311343

312344
.. jupyter-execute::
313345

314-
315346
x1 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [0, 1, 2])])
316347
x2 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [3, 4, 5])])
317348
xr.combine_by_coords([x2, x1])
318349

319-
These functions can be used by :py:func:`~xarray.open_mfdataset` to open many
350+
These functions are used by :py:func:`~xarray.open_mfdataset` to open many
320351
files as one dataset. The particular function used is specified by setting the
321352
argument ``'combine'`` to ``'by_coords'`` or ``'nested'``. This is useful for
322353
situations where your data is split across many files in multiple locations,

doc/user-guide/terminology.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ complete examples, please consult the relevant documentation.*
220220
)
221221

222222
# combine the datasets
223-
combined_ds = xr.combine_by_coords([ds1, ds2])
223+
combined_ds = xr.combine_by_coords([ds1, ds2], join="outer")
224224
combined_ds
225225

226226
lazy

doc/whats-new.rst

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,18 @@ Breaking changes
2121
Deprecations
2222
~~~~~~~~~~~~
2323

24+
- Start a deprecation cycle for changing the default keyword arguments to :py:func:`concat`, :py:func:`merge`,
25+
:py:func:`combine_nested`, :py:func:`combine_by_coords`, and :py:func:`open_mfdataset`.
26+
Emits a :py:class:`FutureWarning` when using old defaults and new defaults would result in different behavior.
27+
Adds an option: ``use_new_combine_kwarg_defaults`` to opt in to new defaults immediately.
28+
New values are:
29+
30+
- ``data_vars``: None which means ``all`` when concatenating along a new dimension, and ``"minimal"`` when concatenating along an existing dimension
31+
- ``coords``: "minimal"
32+
- ``compat``: "override"
33+
- ``join``: "exact"
34+
35+
(:issue:`8778`, :issue:`1385`, :pull:`10062`). By `Julia Signell <https://github.com/jsignell>`_.
2436

2537
Bug fixes
2638
~~~~~~~~~
@@ -8425,8 +8437,15 @@ Backwards incompatible changes
84258437

84268438
.. code:: python
84278439
8428-
ds = xray.Dataset({"x": 0})
8440+
In [1]: ds = xray.Dataset({"x": 0})
84298441
8442+
In [2]: xray.concat([ds, ds], dim="y")
8443+
Out[2]:
8444+
<xarray.Dataset> Size: 16B
8445+
Dimensions: (y: 2)
8446+
Dimensions without coordinates: y
8447+
Data variables:
8448+
x (y) int64 16B 0 0
84308449
.. code:: python
84318450
84328451
xray.concat([ds, ds], dim="y")

xarray/backends/api.py

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
)
3737
from xarray.backends.locks import _get_scheduler
3838
from xarray.coders import CFDatetimeCoder, CFTimedeltaCoder
39-
from xarray.core import indexing
39+
from xarray.core import dtypes, indexing
4040
from xarray.core.coordinates import Coordinates
4141
from xarray.core.dataarray import DataArray
4242
from xarray.core.dataset import Dataset
@@ -53,6 +53,13 @@
5353
_nested_combine,
5454
combine_by_coords,
5555
)
56+
from xarray.util.deprecation_helpers import (
57+
_COMPAT_DEFAULT,
58+
_COORDS_DEFAULT,
59+
_DATA_VARS_DEFAULT,
60+
_JOIN_DEFAULT,
61+
CombineKwargDefault,
62+
)
5663

5764
if TYPE_CHECKING:
5865
try:
@@ -1459,14 +1466,17 @@ def open_mfdataset(
14591466
| Sequence[Index]
14601467
| None
14611468
) = None,
1462-
compat: CompatOptions = "no_conflicts",
1469+
compat: CompatOptions | CombineKwargDefault = _COMPAT_DEFAULT,
14631470
preprocess: Callable[[Dataset], Dataset] | None = None,
14641471
engine: T_Engine = None,
1465-
data_vars: Literal["all", "minimal", "different"] | list[str] = "all",
1466-
coords="different",
1472+
data_vars: Literal["all", "minimal", "different"]
1473+
| None
1474+
| list[str]
1475+
| CombineKwargDefault = _DATA_VARS_DEFAULT,
1476+
coords=_COORDS_DEFAULT,
14671477
combine: Literal["by_coords", "nested"] = "by_coords",
14681478
parallel: bool = False,
1469-
join: JoinOptions = "outer",
1479+
join: JoinOptions | CombineKwargDefault = _JOIN_DEFAULT,
14701480
attrs_file: str | os.PathLike | None = None,
14711481
combine_attrs: CombineAttrsOptions = "override",
14721482
**kwargs,
@@ -1711,6 +1721,7 @@ def open_mfdataset(
17111721
ids=ids,
17121722
join=join,
17131723
combine_attrs=combine_attrs,
1724+
fill_value=dtypes.NA,
17141725
)
17151726
elif combine == "by_coords":
17161727
# Redo ordering from coordinates, ignoring how they were ordered

xarray/core/dataset.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,13 @@
122122
merge_coordinates_without_align,
123123
merge_data_and_coords,
124124
)
125-
from xarray.util.deprecation_helpers import _deprecate_positional_args, deprecate_dims
125+
from xarray.util.deprecation_helpers import (
126+
_COMPAT_DEFAULT,
127+
_JOIN_DEFAULT,
128+
CombineKwargDefault,
129+
_deprecate_positional_args,
130+
deprecate_dims,
131+
)
126132

127133
if TYPE_CHECKING:
128134
from dask.dataframe import DataFrame as DaskDataFrame
@@ -5321,7 +5327,14 @@ def stack_dataarray(da):
53215327

53225328
# concatenate the arrays
53235329
stackable_vars = [stack_dataarray(da) for da in self.data_vars.values()]
5324-
data_array = concat(stackable_vars, dim=new_dim)
5330+
data_array = concat(
5331+
stackable_vars,
5332+
dim=new_dim,
5333+
data_vars="all",
5334+
coords="different",
5335+
compat="equals",
5336+
join="outer",
5337+
)
53255338

53265339
if name is not None:
53275340
data_array.name = name
@@ -5565,8 +5578,8 @@ def merge(
55655578
self,
55665579
other: CoercibleMapping | DataArray,
55675580
overwrite_vars: Hashable | Iterable[Hashable] = frozenset(),
5568-
compat: CompatOptions = "no_conflicts",
5569-
join: JoinOptions = "outer",
5581+
compat: CompatOptions | CombineKwargDefault = _COMPAT_DEFAULT,
5582+
join: JoinOptions | CombineKwargDefault = _JOIN_DEFAULT,
55705583
fill_value: Any = xrdtypes.NA,
55715584
combine_attrs: CombineAttrsOptions = "override",
55725585
) -> Self:

xarray/core/groupby.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1628,7 +1628,14 @@ def _combine(self, applied, shortcut=False):
16281628
if shortcut:
16291629
combined = self._concat_shortcut(applied, dim, positions)
16301630
else:
1631-
combined = concat(applied, dim)
1631+
combined = concat(
1632+
applied,
1633+
dim,
1634+
data_vars="all",
1635+
coords="different",
1636+
compat="equals",
1637+
join="outer",
1638+
)
16321639
combined = _maybe_reorder(combined, dim, positions, N=self.group1d.size)
16331640

16341641
if isinstance(combined, type(self._obj)):
@@ -1789,7 +1796,14 @@ def _combine(self, applied):
17891796
"""Recombine the applied objects like the original."""
17901797
applied_example, applied = peek_at(applied)
17911798
dim, positions = self._infer_concat_args(applied_example)
1792-
combined = concat(applied, dim)
1799+
combined = concat(
1800+
applied,
1801+
dim,
1802+
data_vars="all",
1803+
coords="different",
1804+
compat="equals",
1805+
join="outer",
1806+
)
17931807
combined = _maybe_reorder(combined, dim, positions, N=self.group1d.size)
17941808
# assign coord when the applied function does not return that coord
17951809
if dim not in applied_example.dims:

xarray/core/options.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
"keep_attrs",
3131
"warn_for_unclosed_files",
3232
"use_bottleneck",
33+
"use_new_combine_kwarg_defaults",
3334
"use_numbagg",
3435
"use_opt_einsum",
3536
"use_flox",
@@ -59,6 +60,7 @@ class T_Options(TypedDict):
5960
warn_for_unclosed_files: bool
6061
use_bottleneck: bool
6162
use_flox: bool
63+
use_new_combine_kwarg_defaults: bool
6264
use_numbagg: bool
6365
use_opt_einsum: bool
6466

@@ -87,6 +89,7 @@ class T_Options(TypedDict):
8789
"warn_for_unclosed_files": False,
8890
"use_bottleneck": True,
8991
"use_flox": True,
92+
"use_new_combine_kwarg_defaults": False,
9093
"use_numbagg": True,
9194
"use_opt_einsum": True,
9295
}
@@ -117,6 +120,7 @@ def _positive_integer(value: Any) -> bool:
117120
"file_cache_maxsize": _positive_integer,
118121
"keep_attrs": lambda choice: choice in [True, False, "default"],
119122
"use_bottleneck": lambda value: isinstance(value, bool),
123+
"use_new_combine_kwarg_defaults": lambda value: isinstance(value, bool),
120124
"use_numbagg": lambda value: isinstance(value, bool),
121125
"use_opt_einsum": lambda value: isinstance(value, bool),
122126
"use_flox": lambda value: isinstance(value, bool),
@@ -256,6 +260,15 @@ class set_options:
256260
use_flox : bool, default: True
257261
Whether to use ``numpy_groupies`` and `flox`` to
258262
accelerate groupby and resampling reductions.
263+
use_new_combine_kwarg_defaults : bool, default False
264+
Whether to use new kwarg default values for combine functions:
265+
:py:func:`~xarray.concat`, :py:func:`~xarray.merge`,
266+
:py:func:`~xarray.open_mfdataset`. New values are:
267+
268+
* ``data_vars``: None
269+
* ``coords``: "minimal"
270+
* ``compat``: "override"
271+
* ``join``: "exact"
259272
use_numbagg : bool, default: True
260273
Whether to use ``numbagg`` to accelerate reductions.
261274
Takes precedence over ``use_bottleneck`` when both are True.

xarray/core/parallel.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -351,7 +351,9 @@ def _wrapper(
351351
result = func(*converted_args, **kwargs)
352352

353353
merged_coordinates = merge(
354-
[arg.coords for arg in args if isinstance(arg, Dataset | DataArray)]
354+
[arg.coords for arg in args if isinstance(arg, Dataset | DataArray)],
355+
join="exact",
356+
compat="override",
355357
).coords
356358

357359
# check all dims are present
@@ -441,7 +443,11 @@ def _wrapper(
441443
# rechunk any numpy variables appropriately
442444
xarray_objs = tuple(arg.chunk(arg.chunksizes) for arg in xarray_objs)
443445

444-
merged_coordinates = merge([arg.coords for arg in aligned]).coords
446+
merged_coordinates = merge(
447+
[arg.coords for arg in aligned],
448+
join="exact",
449+
compat="override",
450+
).coords
445451

446452
_, npargs = unzip(
447453
sorted(
@@ -474,7 +480,10 @@ def _wrapper(
474480
)
475481

476482
coordinates = merge(
477-
(preserved_coords, template.coords.to_dataset()[new_coord_vars])
483+
(preserved_coords, template.coords.to_dataset()[new_coord_vars]),
484+
# FIXME: this should be join="exact", but breaks a test
485+
join="outer",
486+
compat="override",
478487
).coords
479488
output_chunks: Mapping[Hashable, tuple[int, ...]] = {
480489
dim: input_chunks[dim] for dim in template.dims if dim in input_chunks

xarray/plot/dataarray_plot.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,13 @@ def _prepare_plot1d_data(
196196
dim = coords_to_plot.get(v, None)
197197
if (dim is not None) and (dim in darray.dims):
198198
darray_nan = np.nan * darray.isel({dim: -1})
199-
darray = concat([darray, darray_nan], dim=dim)
199+
darray = concat(
200+
[darray, darray_nan],
201+
dim=dim,
202+
coords="minimal",
203+
compat="override",
204+
join="exact",
205+
)
200206
dims_T.append(coords_to_plot[v])
201207

202208
# Lines should never connect to the same coordinate when stacked,

0 commit comments

Comments
 (0)