Skip to content

Conversation

@kyleknap
Copy link
Collaborator

@kyleknap kyleknap commented Sep 2, 2025

Supersedes: #493 and follows suggestion from this comment: #493 (comment)

This change makes AzureBlobFileSystem consistent with other fsspec implementations that use the protocol class tuple when stripping the protocol from fully qualified URIs. It also unblocks consumers to be able to update this protocol tuple if out of the box adlfs does not respect a particular protocol (e.g., deprecated wasb://).

Specifically, for pyiceberg and this PR to allow wasb/wasbs: apache/iceberg-python#1663, it should be able to be updated to either update the tuple directly on the class e.g.:

>>> from adlfs import AzureBlobFileSystem
>>> AzureBlobFileSystem.protocol = AzureBlobFileSystem.protocol + ("wasb", "wasbs")
>>> fs = AzureBlobFileSystem(account_name="myaccount")
>>> fs._strip_protocol("wasb://container/file")
'container/file'
>>> fs.split_path("wasb://container/file")
('container', 'file', None)

Or AzureBlobFileSystem can also be subclassed to add support for the protocols to avoid mutating the adlfs class e.g.:

>>> from adlfs import AzureBlobFileSystem
>>> class WasbSupportedAzureBlobFileSystem(AzureBlobFileSystem):
...     protocol = AzureBlobFileSystem.protocol + ("wasb", "wasbs")

>>> fs = WasbSupportedAzureBlobFileSystem(account_name="myaccount")
>>> fs._strip_protocol("wasb://container/file")
'container/file'
>>> fs.split_path("wasb://container/file")
('container', 'file', None)

cc @kevinjqliu to confirm this works for pyiceberg

This change makes AzureBlobFileSystem consistent with other fsspec
implementations that use the protocol class tuple when stripping
the protocol from fully qualified URIs. It also unblocks consumers
to be able to update this protocol tuple if out of the box adlfs
does not respect a particular protocol (e.g., deprecated wasb://).
Copy link

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! makes sense, let me pull in this change and test locally against apache/iceberg-python#1663

Comment on lines +1810 to +1814
mocker.patch.object(
AzureBlobFileSystem,
"protocol",
AzureBlobFileSystem.protocol + (unsupported_proto,),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if protocol here is meant to be overridden, lets not use mock here since thats specifically for testing.

Would be great to show how to override the protocol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer if we stick with using mock.patch.object, mainly if we modify directly the AzureBlobFileSystem class without restoring state, the protocol mutation lives on for all test cases that are ran after it. By using the mocker fixture, we guarantee that the test automatically undoes the patch.

Technically, we can get around this by adding a fixture that restores the original AzureBlobFileSystem.protocol on each test case, but that would mean additional scaffolding when there is already the mocker fixture and pytest's builtin in monkeypatch fixture.

Other option, I'm fine with is defining a subclass of AzureBlobFileSystem in the test body and override its protocol property in order to avoid lingering state.

Thoughts?

Copy link

@kevinjqliu kevinjqliu Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense! just a nit. i think just having some docs around overriding protocol would be good. the PR description is very helpful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set the .protocol on the instance instead of the class. That's the only real use case for "overriding" that is likely to happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant I actually went with the approach of just overriding the protocol on the instance originally, but I realized it would not work because _strip_protocol() is a class method and will only ever reference the AzureBlobFileSystem class and not the actual instance with the overridden protocol.

So unfortunately, being able to override the protocol is not as clean as I hoped in this comment: #493 (comment) in just being able to override the protocol property on an instance... Instead, as the code sits now, it either requires users to override the protocol through direct monkeypatching of the class or subclassing.

I did explore what it would take to be able to override protocol property on instances. However, it seemed it would require a fairly complicated given how prevalent _strip_protocol() is used in fsspec implementations and did not seem worth the effort. But maybe something we can consider in the future if we see other projects/users wanting instance level overrides?

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yes, you are right I think.

Copy link

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tested overriding AzureBlobFileSystem.protocol with the pyiceberg integration, kevinjqliu/iceberg-python#25

@kyleknap
Copy link
Collaborator Author

kyleknap commented Sep 3, 2025

Thanks @kevinjqliu for confirming! @martindurant would you be able to do a review of this PR? @anjaliratnam-msft is offline this week so won't be able to review it this week.

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of very minor comments. Happy to merge as-is if you don't wish to change anything.

Comment on lines +1810 to +1814
mocker.patch.object(
AzureBlobFileSystem,
"protocol",
AzureBlobFileSystem.protocol + (unsupported_proto,),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set the .protocol on the instance instead of the class. That's the only real use case for "overriding" that is likely to happen.

@kyleknap
Copy link
Collaborator Author

kyleknap commented Sep 3, 2025

@martindurant thanks! I'll take a pass through the suggestions and address/comment them. I'll let you know when it is ready for another look.

@kyleknap
Copy link
Collaborator Author

kyleknap commented Sep 3, 2025

@martindurant the PR should be good to look at again. Commented back on one of the suggestions and pulled in the two other suggestions you had.

@martindurant martindurant merged commit 9b9437f into fsspec:main Sep 4, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants