Skip to content

Commit 1f3a474

Browse files
committed
feat: add support for parallel kind processing with threads
Even with 3.14 free-threaded python, this is still a bit slower than multiprocessing on Linux, but it will allow us to start experimenting with it more, and may allow users on macOS and Windows to immediately see a speed-up.
1 parent d2d2941 commit 1f3a474

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

src/taskgraph/generator.py

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from concurrent.futures import (
1111
FIRST_COMPLETED,
1212
ProcessPoolExecutor,
13+
ThreadPoolExecutor,
1314
wait,
1415
)
1516
from dataclasses import dataclass
@@ -317,10 +318,18 @@ def _load_tasks_parallel(self, kinds, kind_graph, parameters):
317318
futures = set()
318319
edges = set(kind_graph.edges)
319320

320-
with ProcessPoolExecutor(
321-
mp_context=multiprocessing.get_context("fork")
322-
) as executor:
321+
# use processes if available; this allows us to use multiple CPU cores
322+
# we should revisit this default when free-threaded python is more
323+
# stable and performant. in the meantime, allowing the usage of threads
324+
# can still be helpful when `fork` multiprocessing is not available
325+
# (like windows and mac), and gives users the option to try using
326+
# free threaded python to speed things up
327+
if "fork" in multiprocessing.get_all_start_methods() and not os.environ.get("TASKGRAPH_USE_THREADS"):
328+
factory = lambda: ProcessPoolExecutor(mp_context=multiprocessing.get_context("fork"))
329+
else:
330+
factory = lambda: ThreadPoolExecutor(max_workers=os.process_cpu_count())
323331

332+
with factory() as executor:
324333
def submit_ready_kinds():
325334
"""Create the next batch of tasks for kinds without dependencies."""
326335
nonlocal kinds, edges, futures
@@ -433,13 +442,6 @@ def _run(self):
433442
yield "kind_graph", kind_graph
434443

435444
logger.info("Generating full task set")
436-
# Current parallel generation relies on multiprocessing, and forking.
437-
# This causes problems on Windows and macOS due to how new processes
438-
# are created there, and how doing so reinitializes global variables
439-
# that are modified earlier in graph generation, that doesn't get
440-
# redone in the new processes. Ideally this would be fixed, or we
441-
# would take another approach to parallel kind generation. In the
442-
# meantime, it's not supported outside of Linux.
443445
if "fork" not in multiprocessing.get_all_start_methods() or os.environ.get(
444446
"TASKGRAPH_SERIAL"
445447
):

0 commit comments

Comments
 (0)