Skip to content

Conversation

@morotti
Copy link
Contributor

@morotti morotti commented Mar 7, 2025

EDIT: nevermind, found another bug, need more time to fix

… read

the buffer is truncated once before returning.
the buffer should not be truncated between reads. there is no scenario where this is the right thing to do.

in the usual path, the buffer is set to the filesize + 1 bytes. the buffer should not be truncated until the full file is read.
for reference, trying to read the whole file in one go does not work because read() does not return large files in one go.
I see a first read() between 10 MB and 3000 MB on my ubuntu machines (large VM with 32 GB of memory). the first read with cold cache can be on the small end, repeated reads of the same file can get to the larger end.

fortunately, the python interpreter was optimized to realloc the bytes buffer in-place, without copy. behavior may vary with the operating system.
I see multi gigabytes buffers getting reallocated in microseconds on my ubuntu machine. the reallocation works and the performance impact of this bug is extremely minimal. behavior may vary with the operating system.

this code might have a sizable performance drop when the first read() is small.
thankfully it should be very rare in practice. (either a tiny read from the OS or the file size has changed since the file was opened).
the first read would truncate the buffer, then the buffer will grow gradually with a heuristic new_buffersize(). it grows in steps as little as 8 kilobytes.
it is inefficient to do many small reads of mere kilobytes to read the full file. the heuristic should be optimized but that is work for another commit.
thankfully the effect should be minimal. the filesystem cache and readahead should minimize the effect of tiny reads.
@morotti morotti closed this Mar 7, 2025
@morotti morotti deleted the io-do-not-truncate-buffer branch March 7, 2025 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants