Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Unreleased

* Respect `AzureBlobFileSystem.protocol` tuple when removing protocols from fully-qualified
paths provided to `AzureBlobFileSystem` methods.

* Added `AzureBlobFileSystem.rm_file()`

2025.8.0
--------
Expand Down
18 changes: 18 additions & 0 deletions adlfs/spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1287,6 +1287,24 @@ async def _rm_files(

sync_wrapper(_rm_files)

async def _rm_file(self, path: str, **kwargs):
"""Delete a file.

Parameters
----------
path: str
File to delete.
"""
container_name, p, _ = self.split_path(path)
Copy link
Collaborator

@kyleknap kyleknap Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant question for you when you have the chance...

When using a version_aware file system and the rm_file, should we be deleting versioned objects if the version is specified in the path (e.g., data/root/a/file.txt?versionid=<some-verson-id>)? Azure Blob is similar to S3 in order to delete a version of the object/blob a version id must be provided to explicitly delete that version snapshot. It did not look like s3fs did this for its rm_file implementation but figured to ask.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anjaliratnam-msft I'm thinking we go ahead and support version ids that are supplied as part of the path for rm_file. Specifically, I'm leaning this way because:

  1. It is straightforward to plumb it in. We just need to save the version_id from split_path and provide it to delete_blob. Similar to how it's done in get_file()
  2. Version id is retrieved from the path for many of the operations that act on a single path (e.g. info(), open())
  3. If we don't support it, there's not really any way to delete version ids and would require using the SDK directly instead of adlfs interfaces

We should also make sure we add a test for deleting versioned blobs as well.

try:
async with self.service_client.get_container_client(
container=container_name
) as cc:
await cc.delete_blob(p)
except Exception as e:
raise RuntimeError("Failed to remove %s for %s", path, e) from e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should instead be throwing RuntimeErrors for any exception. Mainly it is a fairly general exception that is hard to catch and programmatically act on. Instead, we should be only remapping exceptions to only known built-in exceptions that we want to handle.

Looking through the adlfs codebase, it looks like a common pattern is to map ResourceNotFound to the builtin FileNotFoundError error, but all other possible azure client errors are not touched. I think it makes sense to follow this pattern and reraise as FileNotFoundError similar to here.

We should also make sure to add a test that it reraises as FileNotFoundError when the blob does not exist.

self.invalidate_cache(self._parent(path))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poking more through the adlfs codebase, I'm thinking we will also need to invalidate the path as well. While it does not seem like some of the other methods handle this (and arguably something we should circle back to addressing eventually), we can run into inconsistencies where the path can still reside in the dircache. For example, if ls() is run on the file, it still shows up as present:

fs = get_fs()
upload(fs)
print(fs.ls(f"{CONTAINER_NAME}/small.bin"))
fs.rm_file(f"{CONTAINER_NAME}/small.bin")
print("Still cached:", fs.ls(f"{CONTAINER_NAME}/small.bin"))  # Should throw FileNotFoundError

We should make sure to add a test case for this as well.


async def _separate_directory_markers_for_non_empty_directories(
self, file_paths: typing.Iterable[str]
) -> typing.Tuple[typing.List[str], typing.List[str]]:
Expand Down
16 changes: 16 additions & 0 deletions adlfs/tests/test_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -2210,3 +2210,19 @@ def test_write_max_concurrency(storage, max_concurrency, blob_size, blocksize):
with fs.open(path, "rb") as f:
assert f.read() == data
fs.rm(container_name, recursive=True)


def test_rm_file(storage):
fs = AzureBlobFileSystem(
account_name=storage.account_name,
connection_string=CONN_STR,
)
path = "data/test_file.txt"
with fs.open(path, "wb") as f:
f.write(b"test content")

assert fs.exists(path)
fs.rm_file(path)
with pytest.raises(FileNotFoundError):
fs.ls(path)
assert not fs.exists(path)