Skip to content

Commit d58b344

Browse files
committed
PERF/BUG: FileIO_readall() should not truncate the buffer after every read
the buffer is truncated once before returning. the buffer should not be truncated between reads. there is no scenario where this is the right thing to do. in the usual path, the buffer is set to the filesize + 1 bytes. the buffer should not be truncated until the full file is read. for reference, trying to read the whole file in one go does not work because read() does not return large files in one go. I see a first read() between 10 MB and 3000 MB on my ubuntu machines (large VM with 32 GB of memory). the first read with cold cache can be on the small end, repeated reads of the same file can get to the larger end. fortunately, the python interpreter was optimized to realloc the bytes buffer in-place, without copy. behavior may vary with the operating system. I see multi gigabytes buffers getting reallocated in microseconds on my ubuntu machine. the reallocation works and the performance impact of this bug is extremely minimal. behavior may vary with the operating system. this code might have a sizable performance drop when the first read() is small. thankfully it should be very rare in practice. (either a tiny read from the OS or the file size has changed since the file was opened). the first read would truncate the buffer, then the buffer will grow gradually with a heuristic new_buffersize(). it grows in steps as little as 8 kilobytes. it is inefficient to do many small reads of mere kilobytes to read the full file. the heuristic should be optimized but that is work for another commit. thankfully the effect should be minimal. the filesystem cache and readahead should minimize the effect of tiny reads.
1 parent 12db452 commit d58b344

File tree

1 file changed

+0
-5
lines changed

1 file changed

+0
-5
lines changed

Modules/_io/fileio.c

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -809,11 +809,6 @@ _io_FileIO_readall_impl(fileio *self)
809809
Py_DECREF(result);
810810
return NULL;
811811
}
812-
813-
if (PyBytes_GET_SIZE(result) < (Py_ssize_t)bufsize) {
814-
if (_PyBytes_Resize(&result, bufsize) < 0)
815-
return NULL;
816-
}
817812
}
818813

819814
n = _Py_read(self->fd,

0 commit comments

Comments
 (0)