-
-
Couldn't load subscription status.
- Fork 33.3k
Description
Bug report
Bug description:
I am trying to use tarfile to write very large archives, and the process is being killed by the OOM killer. I would expect+want to be able to use it in streaming mode (write an unlimited number of files) like the standard tar command line utility supports by default.
Reproduction case attached.
import gc
import io
import os
import psutil
import tarfile
if __name__ == "__main__":
t = tarfile.open("a.tar", mode="w") # default compresslevel 9
for i in range(1,100_000_000):
if i % 10_000 == 0:
gc.collect()
process = psutil.Process(os.getpid())
mem_info = process.memory_info()
mem = mem_info.rss
print(f"Iteration {i}, memory usage: {mem}")
bs = (" "*1000 + str(i)).encode('utf8')
with io.BytesIO(bs) as file:
tarinfo = tarfile.TarInfo(name="cool_files/{i}.txt")
tarinfo.size = len(bs)
t.addfile(tarinfo, file)The memory usage increases without bound because the list of this line in addfile():
self.members.append(tarinfo)I'm not sure what the use-case is for this line. In write-only mode, it does not seem useful. Maybe mixed read/write? But in general it does not seem correct to assume you can fit all the tarinfo's in memory.
Edit: As a workaround, I'm setting t.members=[] manually.
CPython versions tested on:
3.13
Operating systems tested on:
Linux
Metadata
Metadata
Assignees
Labels
Projects
Status