Skip to content

Conversation

jbms
Copy link

@jbms jbms commented Sep 10, 2023

No description provided.

@jbms
Copy link
Author

jbms commented Sep 10, 2023

@normanrz Please take a look.

@jbms
Copy link
Author

jbms commented Sep 10, 2023

@MSanKeys963 Looks like there is an issue with the docs build that is unrelated to this PR.

@jbms
Copy link
Author

jbms commented Sep 10, 2023

@martindurant Would appreciate your perspective on this --- I imagine you might say that we should just use fsspec syntax instead, though.

@martindurant
Copy link
Member

Well indeed, I could say "why invent another"; although translating between | and :: syntax ought to be straight forward. fsspec also cares about fs parameters that might be embedded in URLs and wildcards for globbing.

@normanrz
Copy link
Member

While standardizing a URL scheme has benefits on its own, I think the main benefit/motivation for this ZEP is the formalization of Zip stores. Essentially, to comply with this ZEP, implementations need to implement zip stores. Maybe that should be written out more explicitly?

@jbms
Copy link
Author

jbms commented Sep 22, 2023

While standardizing a URL scheme has benefits on its own, I think the main benefit/motivation for this ZEP is the formalization of Zip stores. Essentially, to comply with this ZEP, implementations need to implement zip stores. Maybe that should be written out more explicitly?

While this ZEP was prompted by our discussion about zip stores, my intention was that we standardize on the syntax for various protocols, but that implementations would choose which ones to support.

I think we could also push implementations to support zip format, but I'm not sure I want to tie that to this URL syntax proposal.

@normanrz
Copy link
Member

@ap-- I think this might also be interesting for upath to implement.

@normanrz
Copy link
Member

@bogovicj this might also be relevant for your OME transformations proposal.

@sanketverma1704
Copy link
Member

sanketverma1704 commented Oct 25, 2023

@MSanKeys963 Looks like there is an issue with the docs build that is unrelated to this PR.

@jbms: I have added #51 to fix the RTD build. Can you please update your PR?
(Seems like I'm unable to update your PR)

@bogovicj
Copy link

bogovicj commented Nov 14, 2023

Thanks @jbms for putting this together! There are a few situations I came up with for which I'm not sure what the
relative URL should be

What does it look like to use ..: to "go up" multiple levels?
Is this correct / valid?

Base URL: gs://bucket/0.zip|zip:a|zarr3:i
Relative URL: ..:..:1.zip|zip:b|zarr3:ii
Resolved URL: gs://bucket/1.zip|zip:b|zarr3:ii

Is it correct / valid to use .. in the "path part" of relative URL, after a ..:?

Base URL: gs://bucket/0/a/i.zarr|zarr3:foo
Relative URL: ..:../b/i.zarr|zarr3:foo
Resolved URL: gs://bucket/0/b/i.zarr|zarr3:foo

If one needs to add an adapter in a relative way, how does one go about it?
For example:

Base URL: gs://bucket/0/a/i.zarr
Desired Resolved URL: gs://bucket/0/a/i.zarr|zarr3:foo

Which, if any, of these do you think should be used? Are any of these invalid?

  • .|zarr3:foo (clearest to me)
  • |zarr3:foo
  • zarr3:foo

@bogovicj
Copy link

One more thing:

We've found it useful to be able to reference a particular part of the attributes stored in json
with a URL. For example, for

this zarr3 zarr.json
{
    "zarr_format": 3,
    "node_type": "array",
    "shape": [10000, 1000],
    "dimension_names": ["rows", "columns"],
    "data_type": "float64",
    "chunk_grid": {
        "name": "regular",
        "configuration": {
            "chunk_shape": [1000, 100]
        }
    },
    "chunk_key_encoding": {
        "name": "default",
        "configuration": {
            "separator": "/"
        }
    },
    "codecs": [{
        "name": "gzip",
        "configuration": {
            "level": 1
        }
    }],
    "fill_value": "NaN",
    "attributes": {
        "foo": 42,
        "bar": "apples",
        "baz": [1, 2, 3, 4]
    }
}
  • /attributes/baz[0] points to 1
  • /shape points to [10000, 1000]
  • /chunk_grid/configuration points to { "chunk_shape": [1000, 100] }

Could you envision adding an attributes: or zarr.json:, or similar adapter, that enaables this?

For example: gs://bucket/0.zip|zip:a|zarr3:i|zarr.json:attributes/foo

A specific use case: I often re-use and reference transformations. Since these are described by metadata (not arrays),
and so referencing the specific metadata is helpful.

For example, if this were adopted, something like this would not uncommon in my workflows:

{
    "type" : "sequence",
    "transformations" : [
        { "url" : "..:/localTransformations|zarr.json:/transform[1]" },
        { "url" : "gs://bucket/path/to/templateTransformation.zarr|zarr3:sharedTransforms|zarr.json:/transform[0]" },
    ]
}

@jbms
Copy link
Author

jbms commented Nov 14, 2023 via email

@jbms
Copy link
Author

jbms commented Nov 15, 2023 via email

jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 17, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 18, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 18, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 18, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 18, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 19, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 19, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
jbms added a commit to google/neuroglancer that referenced this pull request Jan 19, 2025
- New datasource URL syntax based on ZEP 8
proposal (zarr-developers/zeps#48)
- Support for ZIP archives
@sanketverma1704
Copy link
Member

sanketverma1704 commented Jan 22, 2025

From today's Zarr community meeting, @jbms has implemented this ZEP in Neuroglancer. Check here: google/neuroglancer#696

copybara-service bot pushed a commit to google/tensorstore that referenced this pull request May 7, 2025
This is in line with zarr-developers/zeps#48 and
the syntax supported by Neuroglancer.

Currently, zip is supported.  OCDBT support will be added in a
subsequent commit.

PiperOrigin-RevId: 755691199
Change-Id: Ia6cb84c12a986a7dd0ba65e41454fbe6d415aed0
@joshmoore
Copy link
Member

@jbms: I tried pushing a merge of origin to try fixing the build, but was rejected. Could you give it a try?

Copy link

@ianhi ianhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a thorough read of this to understand it and I have left some comments with a few typo fixes.

I also left comments on parts that took me a decent bit of work to understand, or that I don't fully understand in the hope that it's a helpful perspective. I'd rate myself as a competent but not expert reader of a document like this

@jbms
Copy link
Author

jbms commented Sep 26, 2025

I just did a thorough read of this to understand it and I have left some comments with a few typo fixes.

I also left comments on parts that took me a decent bit of work to understand, or that I don't fully understand in the hope that it's a helpful perspective. I'd rate myself as a competent but not expert reader of a document like this

Thanks very much for your review. Based on your comments I made some significant revisions and would appreciate feedback.

Based on my revisions it occurs to me that this may be better as an independent standard, and the zarr spec could just recommend that implementations support it.

Copy link

@ianhi ianhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates, I found it significantly easier to understand on this close reading. I've left a few more comments on the few remaining areas where I found myself confused.

Comment on lines +92 to +93
- `dataset`: An array, group, or other dataset with a defined format
e.g. a zarr array or group.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still struggled a bit here with the general defintion of "dataset". Is there a convention here that I'm missing. I understand zarr array or group, and when i hear dataset my brain jumps to the xarray dataset.

This definition is referred to multiple times below so any further clarification here would be helpful.

Comment on lines +497 to +507
A zarr attribute may be defined that specifies the location of some other
related array using the relative URL pipeline syntax.

The referencing array may be located at
`s3://bucket/path/to/dataset.zip|zip:path/within/zip/|zarr3:`. Using only a
relative path, it could specify the path of another array within
`s3://bucket/path/to/dataset.zip:zip:`, e.g. the relative path
`../another/array/` would refer to
`s3://bucket/path/to/dataset.zip|zip:path/another/array/`. To refer to
`s3://bucket/path/of/another.zip|zip:other/array/`, the relative URL
pipeline `..:../of/another.zip|zip:other/array/|zarr3:` can be used.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here is conflicting with the backtick formatting. (at least on github)

The referencing array may be located at
`s3://bucket/path/to/dataset.zip|zip:path/within/zip/|zarr3:`. Using only a
relative path, it could specify the path of another array within
`s3://bucket/path/to/dataset.zip:zip:`, e.g. the relative path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`s3://bucket/path/to/dataset.zip:zip:`, e.g. the relative path
`s3://bucket/path/to/dataset.zip|zip:`, e.g. the relative path

I think?


- `gs://bucket/path/to/data|byte-range:1000-2000`

- `tiff:`, `jpeg:`, `png:`, `bmp:`, `avif:`, `webp:`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both these and byte-range, I don't know how something like zarr-python is meant to handle this. Surely zarr can't be responsible for reading different image formats?

This feels like it gets to your point:

Based on my revisions it occurs to me that this may be better as an independent standard,


The following syntaxes are supported:

- `icechunk:path/to/node/`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is implicitly specifying relative to HEAD on a default branch. Is that the intent? I think we can be more explicit that this points to the latest commit on main which is created by default


- `icechunk:` for [Icechunk](https://icechunk.io/en/latest/)

The base URL msut refer to a `directory` resource (which is expected
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The base URL msut refer to a `directory` resource (which is expected
The base URL must refer to a `directory` resource (which is expected

Comment on lines +266 to +267
For the purpose of relative URLs, the path component does not include the
`@version/` prefix if present.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For the purpose of relative URLs, the path component does not include the
`@version/` prefix if present.
For the purpose of relative URLs, the path component does not include the
`@version/` prefix if present in the base URL.

?

Comment on lines +523 to +527
Note: An `absolute_path` overrides any existing *path* component of the
inner-most sub-URL of the base , but is still relative to the scheme and other
components of the inner-most sub-URL of the base URL pipeline that precede its
path component, if any. The specific scheme of the sub-URL defines what
portion, if any, constitutes the path component.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of an absolute_path would be very helpful. I don't fully understand how the scheme and other components collapse without hte path.

small typo fix:

Suggested change
Note: An `absolute_path` overrides any existing *path* component of the
inner-most sub-URL of the base , but is still relative to the scheme and other
components of the inner-most sub-URL of the base URL pipeline that precede its
path component, if any. The specific scheme of the sub-URL defines what
portion, if any, constitutes the path component.
Note: An `absolute_path` overrides any existing *path* component of the
inner-most sub-URL of the base, but is still relative to the scheme and other
components of the inner-most sub-URL of the base URL pipeline that precede its
path component, if any. The specific scheme of the sub-URL defines what
portion, if any, constitutes the path component.

Comment on lines +561 to +562
The use of outer-to-inner order for the sub-URLs enables completion of both
paths and sub-URL schemes as the user types.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this very compelling as a reason


If passed a URL that resolves to a `dataset` resource, returns an error.

- `open_file`: opens a file from a URL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the use case for byte-range and tiff? Would I use zarr.open_file("s3:/some/path/image.tiff") to get a file handle on that tiff that I could then pass to tifffile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants