Commit 1445c38
authored
Optimize TestsCache.compute_file_hash
The optimized code achieves a 52% speedup by replacing the traditional file reading approach with a more efficient buffered I/O pattern using `readinto()` and `memoryview`.
**Key optimizations:**
1. **Pre-allocated buffer with `readinto()`**: Instead of `f.read(8192)` which allocates a new bytes object on each iteration, the code uses a single `bytearray(8192)` buffer and reads data directly into it with `f.readinto(mv)`. This eliminates repeated memory allocations.
2. **Memory view for zero-copy slicing**: The `memoryview(buf)` allows efficient slicing (`mv[:n]`) without copying data, reducing memory overhead when updating the hash with partial buffers.
3. **Direct `open()` with unbuffered I/O**: Using `open(path, "rb", buffering=0)` instead of `Path(path).open("rb")` avoids the Path object overhead and disables Python's internal buffering to prevent double-buffering since we're managing our own buffer.
**Performance impact**: The line profiler shows the critical file opening operation dropped from 83.4% to 62.2% of total time, while the new buffer operations (`readinto`, `memoryview`) are very efficient. This optimization is particularly effective for medium to large files where the reduced memory allocation overhead compounds across multiple read operations.
**Best use cases**: This optimization excels when computing hashes for files larger than the 8KB buffer size, where the memory allocation savings become significant, and when called frequently in batch operations.1 parent b210ba4 commit 1445c38
1 file changed
+6
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
163 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
164 | 166 | | |
165 | | - | |
166 | | - | |
| 167 | + | |
| 168 | + | |
167 | 169 | | |
168 | | - | |
| 170 | + | |
169 | 171 | | |
170 | 172 | | |
171 | 173 | | |
| |||
0 commit comments