Skip to content

Manipulation of coordinages do not materialize to kerchunk refs  #281

@jbusecke

Description

@jbusecke

@norlandrhagen and I just came across what we believe is a bug when I manually set variables as coordinates on a virtual dataset.

To recreate I am taking a single CMIP6 output file and virtualize it:

from virtualizarr import open_virtual_dataset

url = 's3://esgf-world/CMIP6/CMIP/CCCma/CanESM5/historical/r10i1p1f1/Omon/uo/gn/v20190429/uo_Omon_CanESM5_historical_r10i1p1f1_gn_185001-186012.nc'

vds = open_virtual_dataset(url, indexes={}, reader_options={'storage_options':{'anon':True}})
vds
image

Works great, but there are some coordinates declared as variables (maybe this is related to #189? ). Either way if I try to correct this on the virtualized dataset everything seems fine

vds_modified = vds.set_coords(['latitude'])
vds_modified
image

Now I expected that these modifications would be saved when I materialize and reload the dataset

import xarray as xr
vds_modified.virtualize.to_kerchunk(
    'testing.parquet', format="parquet"
)
import xarray as xr
ds_reopened = xr.open_dataset(
    'testing.parquet',
    engine='kerchunk',
    backend_kwargs={
        'storage_options':{"remote_options":{'anon':True}}
    }
)
ds_reopened

but somehow I am getting another variable as a coordinate? Note that 'longitude' is now a coordinate all the sudden...

image

Note this is my attempt to simplify a more complex multi-file situation where we set all variables !='uo' as coordinates and the roundtripped xarray dataset did not reflect this at all. I am pretty confused about what is going on above, but hope that investigating this curious issue will clear up this bug entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions