-
Notifications
You must be signed in to change notification settings - Fork 54
Description
@norlandrhagen and I just came across what we believe is a bug when I manually set variables as coordinates on a virtual dataset.
To recreate I am taking a single CMIP6 output file and virtualize it:
from virtualizarr import open_virtual_dataset
url = 's3://esgf-world/CMIP6/CMIP/CCCma/CanESM5/historical/r10i1p1f1/Omon/uo/gn/v20190429/uo_Omon_CanESM5_historical_r10i1p1f1_gn_185001-186012.nc'
vds = open_virtual_dataset(url, indexes={}, reader_options={'storage_options':{'anon':True}})
vds
Works great, but there are some coordinates declared as variables (maybe this is related to #189? ). Either way if I try to correct this on the virtualized dataset everything seems fine
vds_modified = vds.set_coords(['latitude'])
vds_modified
Now I expected that these modifications would be saved when I materialize and reload the dataset
import xarray as xr
vds_modified.virtualize.to_kerchunk(
'testing.parquet', format="parquet"
)
import xarray as xr
ds_reopened = xr.open_dataset(
'testing.parquet',
engine='kerchunk',
backend_kwargs={
'storage_options':{"remote_options":{'anon':True}}
}
)
ds_reopenedbut somehow I am getting another variable as a coordinate? Note that 'longitude' is now a coordinate all the sudden...
Note this is my attempt to simplify a more complex multi-file situation where we set all variables !='uo' as coordinates and the roundtripped xarray dataset did not reflect this at all. I am pretty confused about what is going on above, but hope that investigating this curious issue will clear up this bug entirely.