Skip to content

bytearray is not memory safe in free-threaded Python #127472

@robsdedude

Description

@robsdedude

Bug report

Bug description:

While working on improving the compatibility of PyO3's bytearray wrapper with free-threaded Python (3.13t), I ended up looking at the bytearray implementation and couldn't see any critical sections or other synchronization mechanisms. This led me to believe that using bytearrays concurrently (pure python) might not be memory safe. I managed to write a reproducer:

import copy
import sys
import threading

SIZE = 1_000_000_000  # 1GB


print("Allocating initial arrays", flush=True)
_original = bytearray(42 for _ in range(SIZE))
_garbage = bytearray(13 for _ in range(SIZE // 4))


array = copy.copy(_original)


def new_array():
    return copy.copy(_original)


def worker1():
    global array
    while True:
        print("Extending array", flush=True)
        array.extend(array)
        print("Recreating array", flush=True)
        array = new_array()


def worker2():
    while True:
        expected = {0, 42}
        # Arguably, we shouldn't even see 0, but let's be lenient and assume
        # it might be zeroed memory not yet set to the actual value. In
        # reality, seeing 0 very likely indicates reading uninitialized memory.
        # When changing the program to also fail on 0, we can see a failure
        # much faster.
        for i in (0, SIZE - 1, -SIZE, -1):
            value = array[i]
            if value not in expected:
                print(
                    f"Array corrupted (observed array[{i}] = {value})",
                    file=sys.stderr,
                    flush=True,
                )
                return


def worker3():
    print("Putting other stuff into the memory", flush=True)
    while True:
        foo = [copy.copy(_garbage) for _ in range(5)]
        for f in foo:
            f.extend(f)
        del foo


t1 = threading.Thread(target=worker1, daemon=True)
t2 = threading.Thread(target=worker2, daemon=True)
t3 = threading.Thread(target=worker3, daemon=True)

t1.start()
t2.start()
t3.start()

t2.join()

Obviously, this is a racy program and I don't expect it to be very useful. But worker2 should never be seeing the garbage of worker3. Yet it does.

The output I got:
Allocating initial arrays
Extending array
Putting other stuff into the memory
Recreating array
Extending array
Recreating array
Extending array
Recreating array
Extending array
Recreating array
Extending array
Recreating array
Extending array
Recreating array
Extending array
Recreating array
Extending array
Recreating array
Extending array
Recreating array
Extending array
Array corrupted (observed array[-1000000000] = 13)

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions