Optimize TestsCache.compute_file_hash

codeflash-ai[bot] · web-flow · commit 1445c385ec59 · 2025-10-10T18:29:34.000Z
The optimized code achieves a 52% speedup by replacing the traditional file reading approach with a more efficient buffered I/O pattern using `readinto()` and `memoryview`. 

**Key optimizations:**

1. **Pre-allocated buffer with `readinto()`**: Instead of `f.read(8192)` which allocates a new bytes object on each iteration, the code uses a single `bytearray(8192)` buffer and reads data directly into it with `f.readinto(mv)`. This eliminates repeated memory allocations.

2. **Memory view for zero-copy slicing**: The `memoryview(buf)` allows efficient slicing (`mv[:n]`) without copying data, reducing memory overhead when updating the hash with partial buffers.

3. **Direct `open()` with unbuffered I/O**: Using `open(path, "rb", buffering=0)` instead of `Path(path).open("rb")` avoids the Path object overhead and disables Python's internal buffering to prevent double-buffering since we're managing our own buffer.

**Performance impact**: The line profiler shows the critical file opening operation dropped from 83.4% to 62.2% of total time, while the new buffer operations (`readinto`, `memoryview`) are very efficient. This optimization is particularly effective for medium to large files where the reduced memory allocation overhead compounds across multiple read operations.

**Best use cases**: This optimization excels when computing hashes for files larger than the 8KB buffer size, where the memory allocation savings become significant, and when called frequently in batch operations.
diff --git a/codeflash/discovery/discover_unit_tests.py b/codeflash/discovery/discover_unit_tests.py
@@ -160,12 +160,14 @@ def get_function_to_test_map_for_file(
     @staticmethod
     def compute_file_hash(path: str | Path) -> str:
         h = hashlib.sha256(usedforsecurity=False)
-        with Path(path).open("rb") as f:
+        with open(path, "rb", buffering=0) as f:
+            buf = bytearray(8192)
+            mv = memoryview(buf)
             while True:
-                chunk = f.read(8192)
-                if not chunk:
+                n = f.readinto(mv)
+                if n == 0:
                     break
-                h.update(chunk)
+                h.update(mv[:n])
         return h.hexdigest()
 
     def close(self) -> None: