-
Notifications
You must be signed in to change notification settings - Fork 113
Respect protocol tuple in removal from full URIs
#512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change makes AzureBlobFileSystem consistent with other fsspec implementations that use the protocol class tuple when stripping the protocol from fully qualified URIs. It also unblocks consumers to be able to update this protocol tuple if out of the box adlfs does not respect a particular protocol (e.g., deprecated wasb://).
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR! makes sense, let me pull in this change and test locally against apache/iceberg-python#1663
| mocker.patch.object( | ||
| AzureBlobFileSystem, | ||
| "protocol", | ||
| AzureBlobFileSystem.protocol + (unsupported_proto,), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if protocol here is meant to be overridden, lets not use mock here since thats specifically for testing.
Would be great to show how to override the protocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer if we stick with using mock.patch.object, mainly if we modify directly the AzureBlobFileSystem class without restoring state, the protocol mutation lives on for all test cases that are ran after it. By using the mocker fixture, we guarantee that the test automatically undoes the patch.
Technically, we can get around this by adding a fixture that restores the original AzureBlobFileSystem.protocol on each test case, but that would mean additional scaffolding when there is already the mocker fixture and pytest's builtin in monkeypatch fixture.
Other option, I'm fine with is defining a subclass of AzureBlobFileSystem in the test body and override its protocol property in order to avoid lingering state.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense! just a nit. i think just having some docs around overriding protocol would be good. the PR description is very helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set the .protocol on the instance instead of the class. That's the only real use case for "overriding" that is likely to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant I actually went with the approach of just overriding the protocol on the instance originally, but I realized it would not work because _strip_protocol() is a class method and will only ever reference the AzureBlobFileSystem class and not the actual instance with the overridden protocol.
So unfortunately, being able to override the protocol is not as clean as I hoped in this comment: #493 (comment) in just being able to override the protocol property on an instance... Instead, as the code sits now, it either requires users to override the protocol through direct monkeypatching of the class or subclassing.
I did explore what it would take to be able to override protocol property on instances. However, it seemed it would require a fairly complicated given how prevalent _strip_protocol() is used in fsspec implementations and did not seem worth the effort. But maybe something we can consider in the future if we see other projects/users wanting instance level overrides?
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yes, you are right I think.
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tested overriding AzureBlobFileSystem.protocol with the pyiceberg integration, kevinjqliu/iceberg-python#25
|
Thanks @kevinjqliu for confirming! @martindurant would you be able to do a review of this PR? @anjaliratnam-msft is offline this week so won't be able to review it this week. |
martindurant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of very minor comments. Happy to merge as-is if you don't wish to change anything.
| mocker.patch.object( | ||
| AzureBlobFileSystem, | ||
| "protocol", | ||
| AzureBlobFileSystem.protocol + (unsupported_proto,), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can set the .protocol on the instance instead of the class. That's the only real use case for "overriding" that is likely to happen.
|
@martindurant thanks! I'll take a pass through the suggestions and address/comment them. I'll let you know when it is ready for another look. |
|
@martindurant the PR should be good to look at again. Commented back on one of the suggestions and pulled in the two other suggestions you had. |
Supersedes: #493 and follows suggestion from this comment: #493 (comment)
This change makes
AzureBlobFileSystemconsistent with other fsspec implementations that use theprotocolclass tuple when stripping the protocol from fully qualified URIs. It also unblocks consumers to be able to update thisprotocoltuple if out of the boxadlfsdoes not respect a particular protocol (e.g., deprecatedwasb://).Specifically, for pyiceberg and this PR to allow
wasb/wasbs: apache/iceberg-python#1663, it should be able to be updated to either update the tuple directly on the class e.g.:Or
AzureBlobFileSystemcan also be subclassed to add support for the protocols to avoid mutating theadlfsclass e.g.:cc @kevinjqliu to confirm this works for pyiceberg