Skip to content

Conversation

wang2bo2
Copy link

Parsing S3 access point URI, like s3://arn:aws:s3:us-west-2:1234:accesspoint/abc/123.jpg, causes problems in infer_storage_options for the :accesspoint was parsed as port number. This PR addresses that.

@martindurant
Copy link
Member

Whilst this appears to do the right thing, I wonder if you see any way that this can be isolated in s3fs rather than here?

@martindurant
Copy link
Member

ping @wang2bo2

@wang2bo2
Copy link
Author

wang2bo2 commented Nov 5, 2024

Yes, we could remove the use of infer_storage_options and duplicate the extraction of url_query here
https://github.com/fsspec/s3fs/blob/f3f63cbfbfe71a4355abd63cafd8c678c4a5a0af/s3fs/core.py#L394

In the current infer_storage_options implementation, anything after the last : would be excluded from getting into options["host"]

options["host"] = parsed_path.netloc.rsplit("@", 1)[-1].rsplit(":", 1)[0]
even if the non-int port error is caught and handled. So I thought it's good to fix that too.

@martindurant
Copy link
Member

I think it would make sense to look for "accesspoint" in s3fs, but still use to infer_storage_options in the common case. Perhaps the accesspoint branch could simply modify the URL into something that makes sense. An ARN description isn't very much like a URL anyway...

@martindurant
Copy link
Member

Closing in favour of fsspec/s3fs#912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants