attempt to autodetect an s3 compatible URL #3496

ianhi · 2025-09-30T20:39:29Z

Addresses part of #3495 by attempting to autodetect if a URL is S3 compatible or not.
With this change the following "just works"

import zarr
z = zarr.open("https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr", storage_options={"anon":True})
print(list(z.keys()))

This is equivalent to:

  zarr.open(
      "s3://idr/zarr/v0.5/idr0062A/6001240_labels.zarr",
      storage_options={
          "anon": True,
          "client_kwargs": {"endpoint_url": "https://uk1s3.embassy.ebi.ac.uk"}
      }
  )

but now zarr handles that splitting for the user

Add unit tests and/or doctests in docstrings
[NA?] Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

d-v-b · 2025-09-30T20:48:40Z

src/zarr/storage/_common.py

+    is_likely_s3 = (
+        any(has_s3_token(part) for part in hostname_parts)
+        or "object-store" in hostname  # Less likely to have false positives
+        or "objectstore" in hostname
+        or "minio" in hostname_parts  # minio.example.com
+        or "ceph" in hostname_parts  # ceph.example.com
+        or "rgw" in hostname_parts  # rgw.example.com (Ceph RADOS Gateway)
+    )


how reliable is this?

d-v-b · 2025-09-30T20:48:50Z

@martindurant and @kylebarron relevant to your interests.

martindurant · 2025-09-30T20:52:05Z

I don't understand: why would someone provide an HTTP URL for something that you need the S3 protocol for? The storage options only make sense to s3, not http, so the caller already knows.

d-v-b · 2025-09-30T20:58:19Z

I'm inclined against the changes in this PR -- "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr" is unambiguously an https URL, and so it makes sense to use an HTTP-based storage backend to access it. For example, "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr/zarr.json" works just fine. Directory listing does not work, but that's pretty common for zarr data over http.

ianhi · 2025-09-30T20:58:20Z

I ran into this today when trying to load data from: https://idr.github.io/ome-ngff-samples/

the "copy s3 url" button gives a one line URL: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr

which I naively expected to work in zarr. So this PR is basically protection against that situation for other naive users possibly sourcing URLs that look like that.

Ideally zarr would just work and protect me from this, or at least it should have thrown an error, rather silently failing by not returning any data but otherwise working

d-v-b · 2025-09-30T20:59:08Z

the "copy s3 url" button gives a one line URL: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr

IMO a button that says "copy s3 url" should give you an s3 url, not an http url

ianhi · 2025-09-30T21:01:02Z

IMO a button that says "copy s3 url" should give you an s3 url, not an http url

I agree! but I suspect this is not the only place this is happening and we can't fix it everywhere

Also is it possible to specify the endpoint url and bucket and path in one s3: url?

martindurant · 2025-09-30T21:02:21Z

Exactly what @d-v-b says. That URL is just wrong, it doesn't point to anything! If you poke it, the returned headers (of the 404) does contain an "x-amz-*" key, but that might be true from a real website that is hosted somewhere within AWS.

ianhi · 2025-09-30T21:06:26Z

hmm. ok. So should zarr have raised an error for what I did here: #3495 when it loaded as an http store that didn't fully behave like one?

and am I right that there's no one liner for that website to return becuase you can't specify an endpoint-url within the s3: syntax?

ianhi · 2025-09-30T23:02:12Z

I am updating the URLS over there: IDR/ome-ngff-samples#28

going to close this now as I hear the arguments against this change. I would sitll like to find a way to make this more transparent to users but this is clearly not the right approach.

joshmoore · 2025-10-01T07:14:15Z

and am I right that there's no one liner for that website to return becuase you can't specify an endpoint-url within the s3: syntax?

That's my understanding.

attempt to autodetect an s3 compatible URL

11d374b

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 30, 2025

d-v-b reviewed Sep 30, 2025

View reviewed changes

ianhi mentioned this pull request Sep 30, 2025

Make URLs more explcitly S3 urls IDR/ome-ngff-samples#25

Open

ianhi closed this Sep 30, 2025

ianhi mentioned this pull request Oct 1, 2025

Flesh out the Download Data button IDR/ome-ngff-samples#28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

attempt to autodetect an s3 compatible URL #3496

attempt to autodetect an s3 compatible URL #3496

Uh oh!

ianhi commented Sep 30, 2025 •

edited

Loading

Uh oh!

d-v-b Sep 30, 2025

Uh oh!

d-v-b commented Sep 30, 2025

Uh oh!

martindurant commented Sep 30, 2025

Uh oh!

d-v-b commented Sep 30, 2025

Uh oh!

ianhi commented Sep 30, 2025 •

edited

Loading

Uh oh!

d-v-b commented Sep 30, 2025

Uh oh!

ianhi commented Sep 30, 2025

Uh oh!

martindurant commented Sep 30, 2025

Uh oh!

ianhi commented Sep 30, 2025 •

edited

Loading

Uh oh!

ianhi commented Sep 30, 2025

Uh oh!

joshmoore commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

attempt to autodetect an s3 compatible URL #3496

attempt to autodetect an s3 compatible URL #3496

Uh oh!

Conversation

ianhi commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Sep 30, 2025

Uh oh!

martindurant commented Sep 30, 2025

Uh oh!

d-v-b commented Sep 30, 2025

Uh oh!

ianhi commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b commented Sep 30, 2025

Uh oh!

ianhi commented Sep 30, 2025

Uh oh!

martindurant commented Sep 30, 2025

Uh oh!

ianhi commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianhi commented Sep 30, 2025

Uh oh!

joshmoore commented Oct 1, 2025

Uh oh!

Uh oh!

ianhi commented Sep 30, 2025 •

edited

Loading

ianhi commented Sep 30, 2025 •

edited

Loading

ianhi commented Sep 30, 2025 •

edited

Loading