-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Closed
Labels
A-interop-numpyArea: interoperability with NumPyArea: interoperability with NumPybugSomething isn't workingSomething isn't workingneeds triageAwaiting prioritization by a maintainerAwaiting prioritization by a maintainerpythonRelated to Python PolarsRelated to Python Polars
Description
Checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of Polars.
Reproducible example
from threading import Event, Thread
import time
import numpy as np
import polars as pl
import warnings
warnings.filterwarnings("ignore", category=pl.exceptions.PolarsInefficientMapWarning)
SAFE_NUMPY = False # set True to use workaround
N_ROWS = 100
RUN_SECONDS = 2
df = pl.DataFrame(
{
"a": [i for i in range(N_ROWS)],
"b": [i % 2 for i in range(N_ROWS)],
}
)
stop = Event()
def to_numpy_worker():
while not stop.is_set():
if SAFE_NUMPY:
np.stack([df[col].to_numpy() for col in df.columns], axis=1)
else:
df.to_numpy()
def map_elements_worker():
while not stop.is_set():
df.with_columns(
a_mapped=pl.col("a").map_elements(lambda x: x + 1, return_dtype=pl.Int64),
b_mapped=pl.col("b").map_elements(lambda x: x * 2, return_dtype=pl.Int64),
)
threads = [
Thread(target=to_numpy_worker), Thread(target=map_elements_worker)
]
for t in threads:
t.start()
time.sleep(RUN_SECONDS)
stop.set()
for t in threads:
t.join(timeout=2.0)
print("alive_threads:", sum(t.is_alive() for t in threads))Log output
async thread count: 4
Nothing after that. It just hangs.Issue description
Running the attached minimal script hangs when SAFE_NUMPY = False.
The script starts two threads on the same DataFrame:
- thread 1 loops on
df.to_numpy() - thread 2 loops on
df.with_columns(...map_elements(...))
After RUN_SECONDS, the main thread sets an Event and tries to join(timeout=2.0) both threads, then should print alive_threads.
Observed behavior on my side (Polars 1.38.1, Python 3.12.12, Linux): the process hangs and does not exit on its own.
Control behavior:
- if I switch
SAFE_NUMPY = True(usingnp.stack([df[col].to_numpy() for col in df.columns], axis=1)), the same script exits normally withalive_threads: 0.
This suggests a concurrency deadlock/hang involvingDataFrame.to_numpy()and concurrentmap_elementsexecution.
The process exists correctly if with_columns contains only one map_elements call.
Expected behavior
The script should terminate cleanly after stop.set(), and both threads should stop so that alive_threads is 0, regardless of whether conversion uses DataFrame.to_numpy() or per-column Series.to_numpy().
Installed versions
Details
--------Version info---------
Polars: 1.38.1
Index type: UInt32
Platform: Linux-6.17.0-1006-aws-x86_64-with-glibc2.36
Python: 3.12.12 (main, Jan 13 2026, 06:06:33) [GCC 12.2.0]
Runtime: rt32
----Optional dependencies----
Azure CLI <not installed>
adbc_driver_manager <not installed>
altair <not installed>
azure.identity <not installed>
boto3 1.42.42
cloudpickle <not installed>
connectorx <not installed>
deltalake <not installed>
fastexcel <not installed>
fsspec 2026.2.0
gevent <not installed>
google.auth 2.48.0
great_tables <not installed>
matplotlib 3.10.8
numpy 2.1.3
openpyxl <not installed>
pandas 3.0.0
polars_cloud <not installed>
pyarrow 23.0.0
pydantic 2.12.5
pyiceberg <not installed>
sqlalchemy <not installed>
torch <not installed>
xlsx2csv <not installed>
xlsxwriter <not installed>
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-interop-numpyArea: interoperability with NumPyArea: interoperability with NumPybugSomething isn't workingSomething isn't workingneeds triageAwaiting prioritization by a maintainerAwaiting prioritization by a maintainerpythonRelated to Python PolarsRelated to Python Polars