Adding support for Azure remote file hosting #2411

stevetru1 · 2025-09-04T14:47:12Z

This PR contains:

[ X ] New features
Changes to dev-tools e.g. CI config / github tooling
[ X ] Docs
Bug fixes
Code refactor

What is the current behavior? (You can also link to an open issue here)

The current system supports local files and use of S3 (with some limitations)

What is the new behavior?

This adds support for multiple Azure file system connection points and also adds checking for edge cases of remote file systems generally.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

This should not break anything but confirmation by someone with a robust existing S3 setup would be nice.

Other information:

I left flags for some of the edge case checks limited to only azure but they would likely benefit from being applied to other remote filesystems as well for resiliency. These areas are marked with TODOs for consideration.

stevetru1 · 2025-09-04T15:58:25Z

Note - I did some regression testing and found something odd with different versions of the azure abfs python integrations related to multiple azure tenant assignments that make the full implicit flows break. I'll be doing some more debugging to test if it's in this code or a bug in the api. This still works with SAS tokens, Keys, etc.

Added troubleshooting tips for Azure log errors that will occur when the implicit flow does not know how to select a credential. Adding the logic to correct for this will be complicated and largely unnecessary given the available fallback methods.

stevetru1 · 2025-09-04T16:23:54Z

Note - I did some regression testing and found something odd with different versions of the azure abfs python integrations related to multiple azure tenant assignments that make the full implicit flows break. I'll be doing some more debugging to test if it's in this code or a bug in the api. This still works with SAS tokens, Keys, etc.

After diagnosing and testing a bunch of fixes this isn't worth addressing here in my opinion. The failure occurs when users have multiple tenants or other settings that create multiple potential default credentials. Falling back to SAS is the best option and a note is added to the docs to that effect.

jjallaire

Thanks! A few changes requested.

jjallaire · 2025-09-05T17:02:33Z

src/inspect_ai/_util/file.py

+
+        # Attempt to remove the test blob. Some Azure credentials (e.g. SAS without 'd' delete
+        # permission or managed identity with only Data Writer) can create but not delete. Treat
+        # that as writeable (we'll leave behind a tiny marker file that can be GC'd later).


But we'll keep leaving these behind over and over again every time we call is_writeable(). How about if we use the same uuid every time so worst case there is one file left in the bucket?

jjallaire · 2025-09-05T17:03:53Z

src/inspect_ai/_util/file.py

        # note: S3 doesn't give you a directory modification time
-        if "mtime" not in file.keys() and file["type"] == "file":
-            file["mtime"] = self.fs.created(file).timestamp()
+        if "mtime" not in file.keys() and file["type"] == "file" and hasattr(self.fs, "created"):


Could you add a brief comment here explaining why we do the created check and why we ignore the error below

jjallaire · 2025-09-05T17:05:39Z

src/inspect_ai/_util/file.py

        )

    options = deepcopy(DEFAULT_FS_OPTIONS.get(scheme, {}))
+    # Azure Blob / DataLake (adlfs) credential injection (lazy so dependency is optional)


Could you break this into a helper function that exists alongside the DEFAULT_FS_OPTIONS. In fact, I think we should just expose a default_fs_options(scheme) function that does the lookup in the map and then falls back to calling the azure function as required.

jjallaire · 2025-09-05T17:09:58Z