MultiZarrToZarr uses first file's x/y coordinates for all inputs, causing spatial misalignment in concatenated output

When using `MultiZarrToZarr` to combine multiple (time, y, x) NetCDF files along the `time` dimension, the resulting Zarr store inherits the `x` and `y` coordinate arrays exclusively from the first input file. If subsequent input files have different (but overlapping) spatial extents — e.g., different bounding boxes at the same grid resolution — their data seems to be written at the original chunk offsets without remapping to the correct coordinate positions. This causes spatial displacement of data from all files except the first.

All 3D variables in the input NetCDF files have identical (1, 512, 512) chunking along the (time, y, x) dimensions.

Is there a way to merge the data and consolidate `x` and `y` extents across such input files?

Minimal reproducible example:

```
import fsspec
import s3fs
import os
from kerchunk.hdf import SingleHdf5ToZarr
from kerchunk.combine import MultiZarrToZarr
import xarray as xr

s3 = s3fs.S3FileSystem()

bucket = 's3://its-live-data'

granules = [
    "velocity_image_pair/landsatOLI/v02/S80W170/LC08_L1GT_020121_20231013_20231102_02_T2_X_LC09_L1GT_020121_20231106_20231106_02_T2_G0120V02_P084.nc",
    "velocity_image_pair/landsatOLI/v02/S80W170/LC08_L1GT_020120_20201121_20210315_02_T2_X_LC08_L1GT_020120_20210124_20210305_02_T2_G0120V02_P051.nc",
]

so = dict(
    anon=True, default_fill_cache=False, default_cache_type='first'
)

# Read input granules in
singles = []
for each_granule in granules:
    s3_path = os.path.join(bucket, each_granule)
    print(f'Reading {s3_path}...')
    
    with fsspec.open(s3_path, **so) as inf:
        h5chunks = SingleHdf5ToZarr(inf, s3_path, inline_threshold=300)
        singles.append(h5chunks.translate())

# Combine granules
mzz = MultiZarrToZarr(
    singles,
    remote_protocol="s3",
    remote_options={'anon': True},
    concat_dims=["time"],
    identical_dims=['mapping']
)

out = mzz.translate()

# Open merged cube as xarray.Dataset for exploration (plotting)
fs = fsspec.filesystem(
    "reference",
    fo=out,          # or pass mzz_ref dict directly
    remote_protocol="s3",
    remote_options={
        "anon": True,
        "asynchronous": False,    # must match ReferenceFileSystem's mode
    },
    skip_instance_cache=True,
)

mapper = fs.get_mapper("")
ds = xr.open_zarr(mapper, consolidated=False, decode_timedelta=False, chunks={})
```

This is (time, y, x) `v` data variable from original input data (please note the `x` extents):
<img width="1133" height="576" alt="Image" src="https://github.com/user-attachments/assets/38534cb0-9e3a-4785-8d98-d72314c8072f" />

And the same input data as it appears in the virtual cube with `x` extent of original first input file:

<img width="1127" height="596" alt="Image" src="https://github.com/user-attachments/assets/09c59bb1-e1a0-4540-a74c-5c9a3ed9464d" />

Examining x and y chunking shows reference to the first input file:
```
for key in out['refs'].keys():
    if ( key.startswith('x/') or key.startswith('y/') ) and "/." not in key:
        print(f'{key=}: {out['refs'][key]}')
```
<img width="1115" height="70" alt="Image" src="https://github.com/user-attachments/assets/1d0c1976-35dd-4028-9146-f555cc3a3a62" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiZarrToZarr uses first file's x/y coordinates for all inputs, causing spatial misalignment in concatenated output #584

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MultiZarrToZarr uses first file's x/y coordinates for all inputs, causing spatial misalignment in concatenated output #584

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions