Skip to content

itertools.tee causes memory leak by not releasing cached items once consumed by all iterators #138765

@Atry

Description

@Atry

Bug report

Bug description:

The current CPython implementation of itertools.tee appears to cause a memory leak. It caches items from the source iterator as they are consumed, but it fails to release these items from its internal cache even after all active branched iterators have advanced past them.
This behavior contradicts the design principle of tee, which is to provide a memory-efficient way to split an iterator. The documentation's warning that it may require significant auxiliary storage (depending on how much temporary data needs to be stored) implies that the storage should be proportional to the lag between the slowest and fastest iterators. However, the current implementation's storage appears to be proportional to the total progress of the most advanced iterator, causing the cache to grow without bound for the lifetime of the tee object.

import itertools
import weakref
import gc

class TrackableObject:
    """A simple class to track creation and garbage collection."""
    def __init__(self, value):
        self.value = value
        print(f"    [+] Created: {self!r}")

    def __repr__(self):
        return f"TrackableObject({self.value})"

    def __del__(self):
        print(f"    [-] Garbage Collected: {self!r}")

# Store weak references to track object lifecycle
tracked_refs = []

def source_generator():
    for i in range(3):
        obj = TrackableObject(i)
        tracked_refs.append(weakref.ref(obj))
        def del_and_return():
            nonlocal obj
            try:
                return obj
            finally:
                del obj
        yield del_and_return()

print("--- Step 1: Create two iterators from tee ---")
iter1, iter2 = itertools.tee(source_generator())
print("iter1 and iter2 created.\n")

print("--- Step 2: Advance iter1 and then delete it ---")
# iter1 consumes object 0 & 1. tee caches it.
next(iter1)
next(iter1)
# iter1 is deleted. iter2 is now the only active iterator.
del iter1
print("iter1 advanced and deleted.\n")
print(f"Garbage collected {gc.collect(generation=2)} objects")

print("--- Step 3: Advance iter2 past object 0 ---")
# iter2 consumes object 0 & 1 from the cache.
item0 = next(iter2)
print(f"iter2 consumed {item0!r}")
# Now, no active iterator needs object 0 anymore. It should be released from the cache.
del item0
next(iter2)
print(f"Garbage collected {gc.collect(generation=2)} objects")

print("--- Step 4: Check liveness of object 0 (Expected Collected) ---")
# The bug is here: object 0 is still alive because the tee cache holds it.
obj0_ref = tracked_refs[0]
status = "Alive" if obj0_ref() is not None else "Collected"
print(f"Status of TrackableObject(0): {status}\n")
print(f"Garbage collected {gc.collect(generation=2)} objects")


print("\n--- Step 6: Delete the final iterator to trigger cleanup ---")
del iter2
print(f"Garbage collected {gc.collect(generation=2)} objects")

print("--- Final Status Check ---")
for i, ref in enumerate(tracked_refs):
    status = "Alive" if ref() is not None else "Collected"
    print(f"Status of TrackableObject({i}): {status}")

Output:

--- Step 1: Create two iterators from tee ---
iter1 and iter2 created.

--- Step 2: Advance iter1 and then delete it ---
    [+] Created: TrackableObject(0)
    [+] Created: TrackableObject(1)
iter1 advanced and deleted.

Garbage collected 34 objects
--- Step 3: Advance iter2 past object 0 ---
iter2 consumed TrackableObject(0)
Garbage collected 0 objects
--- Step 4: Check liveness of object 0 (Expected Collected) ---
Status of TrackableObject(0): Alive

Garbage collected 0 objects

--- Step 6: Delete the final iterator to trigger cleanup ---
    [-] Garbage Collected: TrackableObject(0)
    [-] Garbage Collected: TrackableObject(1)
Garbage collected 0 objects
--- Final Status Check ---
Status of TrackableObject(0): Collected
Status of TrackableObject(1): Collected

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

Labels

extension-modulesC modules in the Modules dirpendingThe issue will be closed if no feedback is providedtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions