Tiff best practices currently? #549

RichardScottOZ · 2025-04-02T08:50:55Z

RichardScottOZ
Apr 2, 2025

Hi,

I noticed tiff_to_zarr and have tried it out - it generates references fine from a glob of tifs from an s3 path - these seem to be ok if dumped via the encoding or just straight to json.

I test one and it seems sensible - but no data when reading - I had that once before on an https and s3 protocol swap problem - but definitely s3 in the file.

I tried something like this: [just trying to read a list and make the dimension the name - as not really anything temporal or depth or anything in these - just a list of features - each with a chunk as a row when kerchunked.

Is the way to handle things doing something very bespoke per type of file -

json_list = fs2.glob("kerchunk/*.json")

def fn_to_time(index, fs, var, fn):
    subst = fn.split('.tif')[0]
    return subst

from numpy.dtypes import StringDType
mzz = MultiZarrToZarr(json_list,
    remote_protocol='s3',
    remote_options=storage_options,
    coo_map={"feature": fn_to_time},
    coo_dtypes={"feature": StringDType()},
    concat_dims=['feature'],
    identical_dims = ['Y', 'X'])

d = mzz.translate()

print(len(d))
with open('combined.json', 'w') as fj:
    json.dump(d, fj)



python test_kerchunk_mzz.py 
Traceback (most recent call last):
  File "/home/ubuntu/data/test_kerchunk_mzz.py", line 61, in <module>
    d = mzz.translate()
        ^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/kerchunk/combine.py", line 613, in translate
    self.first_pass()
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/kerchunk/combine.py", line 363, in first_pass
    z = zarr.open_group(fs.get_mapper(""))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/hierarchy.py", line 1534, in open_group
    raise ContainsArrayError(path)
zarr.errors.ContainsArrayError: path '' contains an array

RichardScottOZ · 2025-04-02T08:54:01Z

RichardScottOZ
Apr 2, 2025
Author

I have seen a more recent example where this sort of thing 'just works' - not tried to run that particular example.

Latest versions from conda - downgraded to zarr 2.17 though with python 3.11.

Does anyone have a reference json from one of these 'working' examples?

0 replies

martindurant · 2025-04-02T15:56:47Z

martindurant
Apr 2, 2025
Maintainer

def fn_to_time(index, fs, var, fn):
    subst = fn.split('.tif')[0]
    return subst

seems like a totally fine way to make "coordinates" for the set of files, so that they become simple string values.

downgraded to zarr 2.17 though with python 3.11

If you have a workflow that works with zarr2 but not zarr3, it would be good to put the whole thing up with example files, so that I can debug. There was recent work trying to make everything work with zarr3, so it might depend on the exact version of kerchunk you have, perhaps even try installing from current main. The .get_mapper("") suggests your code is older.

Does anyone have a reference json from one of these 'working' examples

I'm not aware of the specific case of many TIFFs with no numerical coordinate. Perhaps someone has one.

zarr.errors.ContainsArrayError: path '' contains an array

implies you have an array rather than a group. As far as I can tell, Tifffile should always groups, so I'm not sure how this might have happened.

0 replies

RichardScottOZ · 2025-04-02T17:27:52Z

RichardScottOZ
Apr 2, 2025
Author

Thanks Martin. Zarr3 was a different problem... I just installed from conda, will grab version info later today and try the latest

0 replies

RichardScottOZ · 2025-04-02T20:07:57Z

RichardScottOZ
Apr 2, 2025
Author

Here's an example file https://gitlab.com/Richard.Scott1/raster-analysis-goals/-/blob/main/test_reference.tif?ref_type=heads

I was just wanting to do a test that had a list of rasters and make them into a dataset without transforming - e.g. one might be called geology.tif one might be geophysics.tif - that sort of idea

Another goal is more like a six dimensional datasrt - modelrun / random seeed / x / y/ modelname / modelsubset - or something like that in some order

0 replies

RichardScottOZ · 2025-04-02T20:11:19Z

RichardScottOZ
Apr 2, 2025
Author

Kerchunk version installed was 0.2.7

0 replies

RichardScottOZ · 2025-04-02T21:07:31Z

RichardScottOZ
Apr 2, 2025
Author

I upgraded via pip to 0.2.8 - which brought zarr 3.0.6 I think it said

the mzz creation worked fine - and I think the combined.json file is ok, but I get an asynchronous option error - which is possibly some setting I have wrong or have missed - just using the basic doc tutorial example guiude

#python 3.11, kerchunk 0.2.8, zarr 3.06

import logging
from datetime import datetime

import os
import configparser
import contextlib
import json
import dask
import fsspec
import rioxarray
import s3fs
import xarray as xr
from distributed import Client
import kerchunk
from kerchunk.tiff import tiff_to_zarr
from kerchunk.combine import MultiZarrToZarr
import zarr
import numpy as np

def get_aws_credentials():
    parser = configparser.RawConfigParser()
    parser.read(os.path.expanduser('~/.aws/credentials'))
    credentials = parser.items('default')
    all_credentials = {key.upper(): value for key, value in [*credentials]}
    with contextlib.suppress(KeyError):
        all_credentials["AWS_REGION"] = all_credentials.pop("REGION")
    return all_credentials

creds = get_aws_credentials()

aws_credentials = {"key": creds['AWS_ACCESS_KEY_ID'], "secret": creds['AWS_SECRET_ACCESS_KEY']}


# Initialize a s3 filesystem
storage_options = dict(
    anon=False, key=creds['AWS_ACCESS_KEY_ID'], secret=creds['AWS_SECRET_ACCESS_KEY'], 
    default_fill_cache=False, default_cache_type="none"
)

fs = s3fs.S3FileSystem(anon=False, key=creds['AWS_ACCESS_KEY_ID'], secret=creds['AWS_SECRET_ACCESS_KEY'])
fs_read = fsspec.filesystem("s3", anon=False, key=creds['AWS_ACCESS_KEY_ID'], secret=creds['AWS_SECRET_ACCESS_KEY'] )


backend_args = {"consolidated": False, "storage_options": {"fo": "combined.json", 
                "remote_protocol": "s3","remote_options": storage_options} }

print(xr.open_dataset("reference://", engine="zarr", backend_kwargs=backend_args) )

# error
python test_kerchunk_mzz_open.py 
Traceback (most recent call last):
  File "/home/ubuntu/data/model-framework/test_kerchunk_mzz_open.py", line 47, in <module>
    print(xr.open_dataset("reference://", engine="zarr", backend_kwargs=backend_args) )
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/api.py", line 687, in open_dataset
    backend_ds = backend.open_dataset(
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1608, in open_dataset
    store = ZarrStore.open_group(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/zarr.py", line 732, in open_group
    ) = _get_open_params(
        ^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1845, in _get_open_params
    zarr_group = zarr.open_group(store, **open_kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/_compat.py", line 43, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/api/synchronous.py", line 527, in open_group
    sync(
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/core/sync.py", line 163, in sync
    raise return_result
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/core/sync.py", line 119, in _runner
    return await coro
           ^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/api/asynchronous.py", line 806, in open_group
    store_path = await make_store_path(store, mode=mode, storage_options=storage_options, path=path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/storage/_common.py", line 305, in make_store_path
    store = FsspecStore.from_url(
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/storage/_fsspec.py", line 176, in from_url
    fs, path = url_to_fs(url, **opts)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/core.py", line 415, in url_to_fs
    fs = filesystem(protocol, **inkwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/registry.py", line 310, in filesystem
    return cls(**storage_options)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/spec.py", line 81, in __call__
    obj = super().__call__(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/implementations/reference.py", line 770, in __init__
    raise ValueError(
ValueError: Reference-FS's target filesystem must have same valueof asynchronous

1 reply

RichardScottOZ Apr 2, 2025
Author

test combined json looked like this:- https://gitlab.com/Richard.Scott1/raster-analysis-goals/-/blob/main/combined_json.head?ref_type=heads - or start of it anyway

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tiff best practices currently? #549

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Tiff best practices currently? #549

Uh oh!

RichardScottOZ Apr 2, 2025

Replies: 6 comments · 1 reply

Uh oh!

RichardScottOZ Apr 2, 2025 Author

Uh oh!

martindurant Apr 2, 2025 Maintainer

Uh oh!

RichardScottOZ Apr 2, 2025 Author

Uh oh!

RichardScottOZ Apr 2, 2025 Author

Uh oh!

RichardScottOZ Apr 2, 2025 Author

Uh oh!

RichardScottOZ Apr 2, 2025 Author

Uh oh!

RichardScottOZ Apr 2, 2025 Author

RichardScottOZ
Apr 2, 2025

Replies: 6 comments 1 reply

RichardScottOZ
Apr 2, 2025
Author

martindurant
Apr 2, 2025
Maintainer

RichardScottOZ
Apr 2, 2025
Author

RichardScottOZ
Apr 2, 2025
Author

RichardScottOZ
Apr 2, 2025
Author

RichardScottOZ
Apr 2, 2025
Author

RichardScottOZ Apr 2, 2025
Author