-
-
Notifications
You must be signed in to change notification settings - Fork 364
attempt to autodetect an s3 compatible URL #3496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
is_likely_s3 = ( | ||
any(has_s3_token(part) for part in hostname_parts) | ||
or "object-store" in hostname # Less likely to have false positives | ||
or "objectstore" in hostname | ||
or "minio" in hostname_parts # minio.example.com | ||
or "ceph" in hostname_parts # ceph.example.com | ||
or "rgw" in hostname_parts # rgw.example.com (Ceph RADOS Gateway) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how reliable is this?
@martindurant and @kylebarron relevant to your interests. |
I don't understand: why would someone provide an HTTP URL for something that you need the S3 protocol for? The storage options only make sense to s3, not http, so the caller already knows. |
I'm inclined against the changes in this PR -- |
I ran into this today when trying to load data from: https://idr.github.io/ome-ngff-samples/ the "copy s3 url" button gives a one line URL: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr which I naively expected to work in zarr. So this PR is basically protection against that situation for other naive users possibly sourcing URLs that look like that. Ideally zarr would just work and protect me from this, or at least it should have thrown an error, rather silently failing by not returning any data but otherwise working |
IMO a button that says "copy s3 url" should give you an s3 url, not an http url |
I agree! but I suspect this is not the only place this is happening and we can't fix it everywhere Also is it possible to specify the endpoint url and bucket and path in one s3: url? |
Exactly what @d-v-b says. That URL is just wrong, it doesn't point to anything! If you poke it, the returned headers (of the 404) does contain an "x-amz-*" key, but that might be true from a real website that is hosted somewhere within AWS. |
hmm. ok. So should zarr have raised an error for what I did here: #3495 when it loaded as an http store that didn't fully behave like one? and am I right that there's no one liner for that website to return becuase you can't specify an endpoint-url within the |
I am updating the URLS over there: IDR/ome-ngff-samples#28 going to close this now as I hear the arguments against this change. I would sitll like to find a way to make this more transparent to users but this is clearly not the right approach. |
That's my understanding. |
Addresses part of #3495 by attempting to autodetect if a URL is S3 compatible or not.
With this change the following "just works"
This is equivalent to:
but now zarr handles that splitting for the user
docs/user-guide/*.md
changes/