Changing forecast dataset time dimensions from Start & Lead to Target & Lead #6943

mktippett · 2022-08-22T01:45:01Z

mktippett
Aug 22, 2022

Weather/climate data often uses two time dimensions: Start time (S; the initial time of the forecast integration) and Lead time (L; how long the forecast integration goes beyond the initial time). Also there is Target time (T; the date being forecast). The relation between the three dimensions is

T = S + L

Any 2 of the 3 time dimensions is sufficient.

My question what is the xarray of making of change of dimensions from S & L to T & L where T = S + L.
(In math this is called a change of coordinates?)

Example dataset with S & L

L = np.arange(1, 6)
S = np.arange(47, 54)
coords = {'L': L, 'S': S}
data = xr.DataArray(np.mod(L.reshape(-1, 1) + S.reshape(1, -1), 4), dims=('L', 'S'), coords=coords, name='data')
T = data.S + data.L
data.coords['T'] = T
ds = data.to_dataset()

And the data can be plotted (sometimes called chiclet charts) using any 2 of the three coordinates (albeit with some slanted boxes)

figs, axs = plt.subplots(1, 3, figsize=(14, 2))
ds.data.plot(ax=axs[0], x='S', y='L');
ds.data.plot(ax=axs[1], x='T', y='L');
ds.data.plot(ax=axs[2], x='T', y='S');

To create a dataset with the same data but T & L, I stacked S &L, replaced the values of S by T (using idea from here), and unstacked. It seems to work.

ds_stacked = ds.stack(SL=['S', 'L'])
T = ds_stacked['S'] + ds_stacked['L'] 
ds_stacked_TL = ds_stacked.reset_index('S', drop=True).assign_coords(S=('SL', T)).set_index(SL='S',append=True)
ds_TL = ds_stacked_TL.unstack().rename({'S': 'Target', 'SL_level_0': 'Lead'})
ds_TL.data.plot();

Is this legit or will it get me in trouble? Is there a more xarray way? (Changing coordinates seems a basic math thing)

Answered by dcherian

Aug 22, 2022

Stacking is a reshape which will get you in trouble if you use dask chunking along S, L.

Does climpred help?

cc @aaronspring

View full answer

dcherian · 2022-08-22T15:44:47Z

dcherian
Aug 22, 2022
Maintainer

Stacking is a reshape which will get you in trouble if you use dask chunking along S, L.

Does climpred help?

cc @aaronspring

2 replies

aaronspring Aug 22, 2022

Your last plot is identical to the middle figure in the first plot - only the viz seems different. All you describe works IMO. And looks quite xarray to me. No loops at least.

@bradyrx and me created climpred to work on initialised forecast verification and found organising the data into dimensions init and lead most useful while climpred would with the time matching to the observations. We started an issue for this conversion to valid_time as a dimension but didn’t come to a conclusion and hence still just use valid_time(init, lead) as a 2d coord. Also as we found the IRIDL forecasts organised in S&L dims, continued thinking in S,L space.

mktippett Aug 22, 2022
Author

climpred is nice! For visualizing forecast-to-forecast agreement of univariate forecast quantity, target time is good (so-called chiclet chart). The vertical features in the last plot are easy to read. And ENSO people like to see forecast skill as function of target and lead (spring predictability barrier).

aaronspring · 2022-08-26T19:11:56Z

aaronspring
Aug 26, 2022

@mktippett I tried your code above and ran into an TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property. with xarray=2022.06.01 and older versions also. Can you confirm the code above worked for you or is there a small typo?

EDIT: works with xarray=0.18.0

Traceback for xarray=2022:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [4], in <cell line: 3>()
      1 ds_stacked = ds.chunk({"L":1}).stack(SL=['S', 'L'])
      2 T = ds_stacked['S'] + ds_stacked['L'] 
----> 3 ds_stacked_TL = ds_stacked.reset_index('S', drop=True).assign_coords(S=('SL', T)).set_index(SL='S',append=True)
      4 ds_TL = ds_stacked_TL.unstack().rename({'S': 'Target', 'SL_level_0': 'Lead'})
      5 display(ds_TL.data) # 250 tasks

File ~/mambaforge/envs/xr/lib/python3.9/site-packages/xarray/core/common.py:592, in DataWithCoords.assign_coords(self, coords, **coords_kwargs)
    590 data = self.copy(deep=False)
    591 results: dict[Hashable, Any] = self._calc_assign_results(coords_combined)
--> 592 data.coords.update(results)
    593 return data

File ~/mambaforge/envs/xr/lib/python3.9/site-packages/xarray/core/coordinates.py:162, in Coordinates.update(self, other)
    160 other_vars = getattr(other, "variables", other)
    161 self._maybe_drop_multiindex_coords(set(other_vars))
--> 162 coords, indexes = merge_coords(
    163     [self.variables, other_vars], priority_arg=1, indexes=self.xindexes
    164 )
    165 self._update_coords(coords, indexes)

File ~/mambaforge/envs/xr/lib/python3.9/site-packages/xarray/core/merge.py:564, in merge_coords(objects, compat, join, priority_arg, indexes, fill_value)
    560 coerced = coerce_pandas_values(objects)
    561 aligned = deep_align(
    562     coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value
    563 )
--> 564 collected = collect_variables_and_indexes(aligned)
    565 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)
    566 variables, out_indexes = merge_collected(collected, prioritized, compat=compat)

File ~/mambaforge/envs/xr/lib/python3.9/site-packages/xarray/core/merge.py:365, in collect_variables_and_indexes(list_of_mappings, indexes)
    362     indexes.pop(name, None)
    363     append_all(coords, indexes)
--> 365 variable = as_variable(variable, name=name)
    366 if name in indexes:
    367     append(name, variable, indexes[name])

File ~/mambaforge/envs/xr/lib/python3.9/site-packages/xarray/core/variable.py:125, in as_variable(obj, name)
    123 elif isinstance(obj, tuple):
    124     if isinstance(obj[1], DataArray):
--> 125         raise TypeError(
    126             "Using a DataArray object to construct a variable is"
    127             " ambiguous, please extract the data using the .data property."
    128         )
    129     try:
    130         obj = Variable(*obj)

TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property.

1 reply

mktippett Aug 26, 2022
Author

Is the error with this part?

ds_stacked = ds.stack(SL=['S', 'L'])
T = ds_stacked['S'] + ds_stacked['L'] 
ds_stacked_TL = ds_stacked.reset_index('S', drop=True).assign_coords(S=('SL', T)).set_index(SL='S',append=True)
ds_TL = ds_stacked_TL.unstack().rename({'S': 'Target', 'SL_level_0': 'Lead'})
ds_TL.data.plot();

I have get no error but I my xarray version is '0.18.0' because I fear change.

aaronspring · 2022-08-26T19:32:57Z

aaronspring
Aug 26, 2022

@mktippett I also found alternative solution with a loop which I previously used:
Take T as a dimension but only works if S is only dim
xr.concat([ds.sel(L=l).swap_dims({"S":"T"}) for l in ds.L],"L").data.plot() provides the same figure

2 replies

mktippett Aug 26, 2022
Author

I will have to mediate on that one for a bit but I love its one-line-ness. Often there are more grids such as lat and lon or here https://github.com/mktippett/NMME/blob/master/n34.ipynb the additional grid is model.

aaronspring Aug 27, 2022

Add this version to climpred: pangeo-data/climpred#775

aaronspring · 2022-08-27T11:24:43Z

aaronspring
Aug 27, 2022

ok. with xarray=0.18.0 your code example also works for me.

following @dcherian's chunking point, when data is chunked ds.chunk({"L":1}) into 5 chunks, then

ds_stacked = ds.chunk({"L":1}).stack(SL=['S', 'L'])
T = ds_stacked['S'] + ds_stacked['L'] 
ds_stacked_TL = ds_stacked.reset_index('S', drop=True).assign_coords(S=('SL', T)).set_index(SL='S',append=True)
ds_TL = ds_stacked_TL.unstack().rename({'S': 'Target', 'SL_level_0': 'Lead'})
display(ds_TL.data) # 250 tasks

whereas the swap_dims way needs less tasks

swap = xr.concat([ds.chunk({"L":1}).sel(L=l).swap_dims({"S":"T"}) for l in ds.L],"L").data
display(swap) # 50 tasks

but overall the skill result is much less data heavy than the initialized inputs, so probably the skill calculation before is much more demanding for memory and compute.

0 replies

Uh oh!

Changing forecast dataset time dimensions from Start & Lead to Target & Lead #6943

Uh oh!

Uh oh!

mktippett Aug 22, 2022

Replies: 4 comments · 5 replies

Uh oh!

dcherian Aug 22, 2022 Maintainer

Uh oh!

aaronspring Aug 22, 2022

Uh oh!

mktippett Aug 22, 2022 Author

Uh oh!

Uh oh!

aaronspring Aug 26, 2022

Uh oh!

mktippett Aug 26, 2022 Author

Uh oh!

Uh oh!

aaronspring Aug 26, 2022

Uh oh!

mktippett Aug 26, 2022 Author

Uh oh!

aaronspring Aug 27, 2022

Uh oh!

aaronspring Aug 27, 2022

mktippett
Aug 22, 2022

Replies: 4 comments 5 replies

dcherian
Aug 22, 2022
Maintainer

mktippett Aug 22, 2022
Author

aaronspring
Aug 26, 2022

mktippett Aug 26, 2022
Author

aaronspring
Aug 26, 2022

mktippett Aug 26, 2022
Author

aaronspring
Aug 27, 2022