Skip to content

Conversation

@NikitaMatskevich
Copy link
Contributor

@NikitaMatskevich NikitaMatskevich commented Nov 3, 2025

Rationale for this change

Starting from version 20, Pyarrow has support for Azure filesystems.

Azure table locations are typically of this format: "abfss://<bucket_name>@<account_name>.<dfs|blob>.core.windows.net//

/<file_path>". When creating a PyArrowFile, we simply retrieve table location and append table-relative path to it. This generates a path with "@<account_name>.<dfs|blob>.core.windows.net" part in it, which cannot be read/written by Pyarrow library. One has to truncate this part from Azure uris.

The proposed fix is just to start a conversation around the issue. I am not 100% sure how and where this should be fixed.

We know this issue does not occur with Fsspec.

Are these changes tested?

Hard to test, because with Azurite it works fine (unlike "real" Azure, Azurite does not have this part in uris). Do you have any ideas of an integration test in mind?

@kevinjqliu
Copy link
Contributor

hey @NikitaMatskevich maybe we should open an issue and move the discussion there :)

Im not sure if i understand the underlying issue and what is not working.
Heres the documentation of the abfss uri syntax, https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri

Could you provide some more details?

@NikitaMatskevich
Copy link
Contributor Author

NikitaMatskevich commented Nov 4, 2025

Hi @kevinjqliu , thanks for looking into it! I copy-pasted the description to the issue: #2698 and added a concrete example of what happens and why it is surely a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants