Tiff best practices currently? #549
Replies: 6 comments 1 reply
-
I have seen a more recent example where this sort of thing 'just works' - not tried to run that particular example. Latest versions from conda - downgraded to zarr 2.17 though with python 3.11. Does anyone have a reference json from one of these 'working' examples? |
Beta Was this translation helpful? Give feedback.
-
def fn_to_time(index, fs, var, fn):
subst = fn.split('.tif')[0]
return subst seems like a totally fine way to make "coordinates" for the set of files, so that they become simple string values.
If you have a workflow that works with zarr2 but not zarr3, it would be good to put the whole thing up with example files, so that I can debug. There was recent work trying to make everything work with zarr3, so it might depend on the exact version of kerchunk you have, perhaps even try installing from current main. The
I'm not aware of the specific case of many TIFFs with no numerical coordinate. Perhaps someone has one.
implies you have an array rather than a group. As far as I can tell, Tifffile should always groups, so I'm not sure how this might have happened. |
Beta Was this translation helpful? Give feedback.
-
Thanks Martin. Zarr3 was a different problem... I just installed from conda, will grab version info later today and try the latest |
Beta Was this translation helpful? Give feedback.
-
Here's an example file https://gitlab.com/Richard.Scott1/raster-analysis-goals/-/blob/main/test_reference.tif?ref_type=heads I was just wanting to do a test that had a list of rasters and make them into a dataset without transforming - e.g. one might be called geology.tif one might be geophysics.tif - that sort of idea Another goal is more like a six dimensional datasrt - modelrun / random seeed / x / y/ modelname / modelsubset - or something like that in some order |
Beta Was this translation helpful? Give feedback.
-
Kerchunk version installed was 0.2.7 |
Beta Was this translation helpful? Give feedback.
-
I upgraded via pip to 0.2.8 - which brought zarr 3.0.6 I think it said the mzz creation worked fine - and I think the combined.json file is ok, but I get an asynchronous option error - which is possibly some setting I have wrong or have missed - just using the basic doc tutorial example guiude #python 3.11, kerchunk 0.2.8, zarr 3.06
import logging
from datetime import datetime
import os
import configparser
import contextlib
import json
import dask
import fsspec
import rioxarray
import s3fs
import xarray as xr
from distributed import Client
import kerchunk
from kerchunk.tiff import tiff_to_zarr
from kerchunk.combine import MultiZarrToZarr
import zarr
import numpy as np
def get_aws_credentials():
parser = configparser.RawConfigParser()
parser.read(os.path.expanduser('~/.aws/credentials'))
credentials = parser.items('default')
all_credentials = {key.upper(): value for key, value in [*credentials]}
with contextlib.suppress(KeyError):
all_credentials["AWS_REGION"] = all_credentials.pop("REGION")
return all_credentials
creds = get_aws_credentials()
aws_credentials = {"key": creds['AWS_ACCESS_KEY_ID'], "secret": creds['AWS_SECRET_ACCESS_KEY']}
# Initialize a s3 filesystem
storage_options = dict(
anon=False, key=creds['AWS_ACCESS_KEY_ID'], secret=creds['AWS_SECRET_ACCESS_KEY'],
default_fill_cache=False, default_cache_type="none"
)
fs = s3fs.S3FileSystem(anon=False, key=creds['AWS_ACCESS_KEY_ID'], secret=creds['AWS_SECRET_ACCESS_KEY'])
fs_read = fsspec.filesystem("s3", anon=False, key=creds['AWS_ACCESS_KEY_ID'], secret=creds['AWS_SECRET_ACCESS_KEY'] )
backend_args = {"consolidated": False, "storage_options": {"fo": "combined.json",
"remote_protocol": "s3","remote_options": storage_options} }
print(xr.open_dataset("reference://", engine="zarr", backend_kwargs=backend_args) )
# error
python test_kerchunk_mzz_open.py
Traceback (most recent call last):
File "/home/ubuntu/data/model-framework/test_kerchunk_mzz_open.py", line 47, in <module>
print(xr.open_dataset("reference://", engine="zarr", backend_kwargs=backend_args) )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/api.py", line 687, in open_dataset
backend_ds = backend.open_dataset(
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1608, in open_dataset
store = ZarrStore.open_group(
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/zarr.py", line 732, in open_group
) = _get_open_params(
^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/xarray/backends/zarr.py", line 1845, in _get_open_params
zarr_group = zarr.open_group(store, **open_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/_compat.py", line 43, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/api/synchronous.py", line 527, in open_group
sync(
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/core/sync.py", line 163, in sync
raise return_result
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/core/sync.py", line 119, in _runner
return await coro
^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/api/asynchronous.py", line 806, in open_group
store_path = await make_store_path(store, mode=mode, storage_options=storage_options, path=path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/storage/_common.py", line 305, in make_store_path
store = FsspecStore.from_url(
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/zarr/storage/_fsspec.py", line 176, in from_url
fs, path = url_to_fs(url, **opts)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/core.py", line 415, in url_to_fs
fs = filesystem(protocol, **inkwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/registry.py", line 310, in filesystem
return cls(**storage_options)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/spec.py", line 81, in __call__
obj = super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/anaconda3/envs/pangeo/lib/python3.11/site-packages/fsspec/implementations/reference.py", line 770, in __init__
raise ValueError(
ValueError: Reference-FS's target filesystem must have same valueof asynchronous |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I noticed tiff_to_zarr and have tried it out - it generates references fine from a glob of tifs from an s3 path - these seem to be ok if dumped via the encoding or just straight to json.
I test one and it seems sensible - but no data when reading - I had that once before on an https and s3 protocol swap problem - but definitely s3 in the file.
I tried something like this: [just trying to read a list and make the dimension the name - as not really anything temporal or depth or anything in these - just a list of features - each with a chunk as a row when kerchunked.
Is the way to handle things doing something very bespoke per type of file -
Beta Was this translation helpful? Give feedback.
All reactions