Skip to content

Python hang/deadlock when DataFrame.to_numpy() runs concurrently with multiple map_elements #26821

@hsahovic

Description

@hsahovic

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

from threading import Event, Thread
import time
import numpy as np
import polars as pl

import warnings
warnings.filterwarnings("ignore", category=pl.exceptions.PolarsInefficientMapWarning)

SAFE_NUMPY = False  # set True to use workaround
N_ROWS = 100
RUN_SECONDS = 2

df = pl.DataFrame(
    {
        "a": [i for i in range(N_ROWS)],
        "b": [i % 2 for i in range(N_ROWS)],
    }
)
stop = Event()

def to_numpy_worker():
    while not stop.is_set():
        if SAFE_NUMPY:
            np.stack([df[col].to_numpy() for col in df.columns], axis=1)
        else:
            df.to_numpy()

def map_elements_worker():
    while not stop.is_set():
        df.with_columns(
            a_mapped=pl.col("a").map_elements(lambda x: x + 1, return_dtype=pl.Int64),
            b_mapped=pl.col("b").map_elements(lambda x: x * 2, return_dtype=pl.Int64),
        )

threads = [
    Thread(target=to_numpy_worker), Thread(target=map_elements_worker)
]

for t in threads:
    t.start()

time.sleep(RUN_SECONDS)
stop.set()

for t in threads:
    t.join(timeout=2.0)

print("alive_threads:", sum(t.is_alive() for t in threads))

Log output

async thread count: 4


Nothing after that. It just hangs.

Issue description

Running the attached minimal script hangs when SAFE_NUMPY = False.

The script starts two threads on the same DataFrame:

  • thread 1 loops on df.to_numpy()
  • thread 2 loops on df.with_columns(...map_elements(...))

After RUN_SECONDS, the main thread sets an Event and tries to join(timeout=2.0) both threads, then should print alive_threads.

Observed behavior on my side (Polars 1.38.1, Python 3.12.12, Linux): the process hangs and does not exit on its own.

Control behavior:

  • if I switch SAFE_NUMPY = True (using np.stack([df[col].to_numpy() for col in df.columns], axis=1)), the same script exits normally with alive_threads: 0.
    This suggests a concurrency deadlock/hang involving DataFrame.to_numpy() and concurrent map_elements execution.

The process exists correctly if with_columns contains only one map_elements call.

Expected behavior

The script should terminate cleanly after stop.set(), and both threads should stop so that alive_threads is 0, regardless of whether conversion uses DataFrame.to_numpy() or per-column Series.to_numpy().

Installed versions

Details
--------Version info---------
Polars:              1.38.1
Index type:          UInt32
Platform:            Linux-6.17.0-1006-aws-x86_64-with-glibc2.36
Python:              3.12.12 (main, Jan 13 2026, 06:06:33) [GCC 12.2.0]
Runtime:             rt32

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                1.42.42
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2026.2.0
gevent               <not installed>
google.auth          2.48.0
great_tables         <not installed>
matplotlib           3.10.8
numpy                2.1.3
openpyxl             <not installed>
pandas               3.0.0
polars_cloud         <not installed>
pyarrow              23.0.0
pydantic             2.12.5
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-interop-numpyArea: interoperability with NumPybugSomething isn't workingneeds triageAwaiting prioritization by a maintainerpythonRelated to Python Polars

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions