Skip to content

Conversation

@mkultraWasHere
Copy link
Contributor

@mkultraWasHere mkultraWasHere commented Nov 5, 2025

Overview

This PR refactors the dreadnode/agent/tools/fs.py module with two major improvements:

  1. Async-first architecture - Converts all filesystem operations from synchronous to asynchronous
  2. Separated implementations - Splits local and S3 filesystem logic into distinct classes with a unified factory interface

Changes

Part 1: Async Transformation

All filesystem methods converted to async:

  • read_file() - Now uses aiofiles for local, aioboto3 for S3
  • read_lines() - Async file reading with line slicing
  • write_file() - Async write operations
  • write_file_bytes() - Binary async writes
  • write_lines() - Async read-modify-write for line editing
  • ls() - Async directory listing via asyncio.to_thread()
  • glob() - Async pattern matching
  • grep() - Parallel file searching with semaphore-based concurrency control
  • cp() - Async copy operations
  • mv() - Async move operations
  • delete() - Async deletion

Part 2: Architecture Refactoring

Class Hierarchy

Before:

class Filesystem(Toolset):
    # Single class handling both local and S3 paths
    # Methods contain if/else logic for different filesystem types

    def read_file(self, path: str) -> str | bytes:
        if self._upath.protocol in ["s3", "s3a"]:
            # S3 logic
        else:
            # Local logic

After:

class FilesystemBase(Toolset):
    # Base class with common interface and shared methods

class LocalFilesystem(FilesystemBase):
    # Local filesystem implementation using aiofiles

class S3Filesystem(FilesystemBase):
    # S3 filesystem implementation using aioboto3

def Filesystem(...) -> LocalFilesystem | S3Filesystem:
    # Factory function for auto-detection and creation

Breaking Changes

None. The factory function maintains 100% backward compatibility:

# All existing code continues to work exactly as before
fs = Filesystem(path="/local/path")
fs = Filesystem(path="s3://bucket/path")

# But methods are now async (which was already planned)
content = await fs.read_file("test.txt")  # Was: content = fs.read_file("test.txt")

Migration required: Update all Filesystem method calls to use await:

# Before
content = fs.read_file("file.txt")
files = fs.ls(".")
matches = fs.grep("pattern", "file.txt")

# After
content = await fs.read_file("file.txt")
files = await fs.ls(".")
matches = await fs.grep("pattern", "file.txt")

Testing

  • Local Filesystem: 16/16 tests passing ✅

    • File operations: read, write, copy, move, delete
    • Binary file handling
    • Line-based editing (insert/overwrite modes)
    • Directory operations
    • Glob pattern matching
    • Grep searching (including recursive)
    • Concurrent operations (10 parallel writes)
  • S3 Filesystem: 11/12 tests passing ✅

    • Read/write operations
    • Line-based editing
    • File manipulation (copy, move, delete)
    • Directory markers
    • Glob and grep searching
    • Note: 1 test fails due to S3 eventual consistency in test environment (bytes verification), not a code issue

Performance Improvements

  1. Async I/O - Non-blocking operations allow concurrent processing
  2. Parallel grep - Search up to 25 files simultaneously (vs sequential before)
  3. S3 server-side copy - No data transfer for S3-to-S3 copies

Dependencies

New required dependencies:

  • aiofiles - Async local file I/O

New optional dependencies:

  • aioboto3 - Async AWS S3 operations (only needed for S3 paths)

@dreadnode-renovate-bot dreadnode-renovate-bot bot added the area/python Changes to Python package configuration and dependencies label Nov 5, 2025

This comment was marked as outdated.

@dreadnode dreadnode deleted a comment from Copilot AI Nov 5, 2025
@mkultraWasHere mkultraWasHere changed the title Refactor filesystem operations to async feat: Refactor filesystem operations to async Nov 5, 2025
Copy link
Contributor

@monoxgas monoxgas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No serious issues, just want to make sure we make sure aiofiles plays nice with UPath.

Also, the readme for aiofiles has commentary about common os modules like mkdir, unlink, etc. that might be worth a look: https://github.com/Tinche/aiofiles

@mkultraWasHere
Copy link
Contributor Author

@monoxgas yea, def needed distinct s3 support. Got messy with single class so pulled out an s3 filesystem class (and assume we can do the same for other cloud fs's). Factory method keeps the singular tool interface unchanged though.

@monoxgas
Copy link
Contributor

monoxgas commented Nov 14, 2025

No serious issues from my end, would just like to see the tests/linting fixed up before we merge.

Def understand the struggle underneath with async, support in fsspec is somewhat spotty depending on the provider, and UPath starts to fall apart in terms of value. Edge cases I'm sure are all over, so I think the overloads and class fallback there is a good approach. Long term we can probably return to a few things:

  • Maybe generalize around fsspec filesystems underneath rather than direct packages: https://developmentseed.org/obstore/latest/
  • Let a user create an fsspec filesystem and hand it to us for us inside a generalized tool class
  • Ditch upath entirely and just steal the logic for building a fsspec and associated handler class from the path prefix.

Interesting pattern here as well: https://github.com/iterative/morefs/blob/main/src/morefs/asyn_local.py

@vabruzzo
Copy link
Contributor

Rather than all the various subclassing, have we considered toolset variants?

@mkultraWasHere
Copy link
Contributor Author

mkultraWasHere commented Nov 19, 2025

I think test/linting all set. Only failing type check is for other code in repo that isnt touched here.

And happy to circle back on FS tool in long term for noted issues above, documented notes.

@mkultraWasHere mkultraWasHere merged commit 7b80f58 into main Nov 21, 2025
7 of 8 checks passed
@mkultraWasHere mkultraWasHere deleted the mk/async-fs branch November 21, 2025 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/python Changes to Python package configuration and dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants