Enable parallelism again by Simn · Pull Request #12747 · HaxeFoundation/haxe

Simn · 2026-03-05T19:04:14Z

This was randomly disabled in some ninja commit last year and I never noticed it. We even added -D disable-parallelism to optionally disable it, so I'm a little confused by it's disabled by default. But either way, let's get this working properly.

Simn · 2026-03-05T19:06:11Z

@copilot Please survey all the Parallel occurrences in filters.ml and trace through the code to find potential data races. Also, we've experienced some memory leaks in the past, but it is unclear if that was our fault or an upstream problem.

Copilot · 2026-03-05T19:06:19Z

@Simn I've opened a new pull request, #12748, to work on those changes. Once the pull request is ready, I'll request review from you.

- Add Parallel.run_with_pool to wrap operations in Task.run (fixes Unhandled(Wait) exception from missing effect handler) - Update all call sites to use run_with_pool instead of bare Lazy.force - Enable parallel analyzer by default (remove opt-in gate) - Make apply_params_stack domain-local via Domain.DLS (fixes data race) - Make alloc_mid use Atomic (consistency with alloc_var) - Protect DCE CfMaybeUsed writes with field_marker_mutex (fixes flag race) Co-authored-by: Simn <634365+Simn@users.noreply.github.com>

Simn · 2026-03-05T22:09:10Z

@copilot I remembered (well, CI reminded me) that there was still a problem with domains and Eval. There's a comment about this in filters.ml:

	(* Note: We cannot have a thread pool up during the before/after_save callbacks because Eval's thread handling
	   currently does not get along with it. This is why we need a separate pool for this operation. *)

Unfortunately, I don't remember what the actual root cause was and why I concluded that there's an interference. Mixing threads and domain is probably not a particularly good idea, but at the same time I can't tell why it wouldn't work. Could you check if you can identify the problem?

And while you're there, I would like to have your assessment on whether or not it would be feasible to make Eval use domains instead of threads for its vthread.

Copilot · 2026-03-05T22:09:17Z

@Simn I've opened a new pull request, #12751, to work on those changes. Once the pull request is ready, I'll request review from you.

…er test Co-authored-by: Simn <634365+Simn@users.noreply.github.com>

Simn · 2026-03-06T15:37:00Z

@copilot This PR aims to enable parallelism for the filtering stage of the compiler again. We have found that there is a problem with the eval target in particular which we haven't been able to track down yet. Here are examples of failing CI runs:

https://github.com/HaxeFoundation/haxe/actions/runs/22768814450/job/66045252148?pr=12747
https://github.com/HaxeFoundation/haxe/actions/runs/22768814450/job/66044060318?pr=12747
https://github.com/HaxeFoundation/haxe/actions/runs/22768814450/job/66047042680?pr=12747

Note: the actual Eval run-time is not affected by parallelism and still runs single-threaded. Also, note that these particular failures never occured on the development branch, so they really do seem related to the changes here.

Copilot · 2026-03-06T15:37:07Z

@Simn I've opened a new pull request, #12757, to work on those changes. Once the pull request is ready, I'll request review from you.

…lti-domain safety (#12757) * Initial plan * Use run_in_new_pool for filter phases to avoid idle domains during eval execution The persistent pool (via run_with_pool) kept domain workers alive during eval interpretation and callbacks, causing subprocess/IPC failures in eval tests. Revert to run_in_new_pool which tears down the pool after each parallel section, matching the pattern that works on development. The sctx.pool and run_with_pool remain available for generators (e.g. genjvm) that use them during code generation. Also restore the EnableParallelAnalyzer guard since the analyzer was specifically identified as leaking when run in parallel (#12414). Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Revert filters.ml to base branch state Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Make process_stubs.c safe for OCaml 5 multi-domain programs Two changes to fix eval subprocess failures when domainslib pool domains are alive: 1. Add caml_enter_blocking_section/caml_leave_blocking_section to all blocking process operations (read, write, waitpid, WaitForSingleObject, ReadFile, WriteFile). Without these, the calling domain holds its runtime lock during blocking I/O, preventing stop-the-world GC from proceeding when worker domains need it. This causes GC stalls that lead to pipe I/O corruption and timing issues. 2. Replace fork()+exec() with posix_spawn() on Unix. OCaml 5's Unix.fork explicitly refuses to fork when domains are spawned (caml_domain_is_multicore check). process_stubs.c bypassed this safety check by calling fork() directly. In multi-threaded programs, fork() can deadlock when worker threads hold internal locks (malloc arenas, glibc internals). posix_spawn() avoids this by using clone(CLONE_VM|CLONE_VFORK) internally on Linux. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Replace process_stubs.c with pure OCaml implementation Remove the C stubs entirely and implement Process using Unix.create_process, which is domain-safe under OCaml 5. This also refactors PipeThings.run_command to use Process.run instead of duplicating the pipe+spawn logic. The old C stubs had two issues in OCaml 5: 1. Missing caml_enter_blocking_section around blocking I/O 2. Direct fork() which is unsafe with multiple domains Unix.create_process handles both correctly (it uses posix_spawn internally on modern systems and properly releases the runtime lock). Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Fix error handling: convert Unix.Unix_error to Failure for eval compatibility The old C stubs used caml_failwith for all errors, which eval catches as Dynamic. Unix.create_process raises Unix.Unix_error instead, which eval couldn't catch. Convert all Unix errors to Failure to preserve backward compatibility. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * catch Unix_error in callers instead * don't turn exit status into exceptions * make Eof extend Exception * catch constructor exception too * back to running everything * hmm * Handle non-existent commands in Process.run without throwing When Unix.create_process fails (e.g. command not found), return a process with pre-closed pipes (reads return EOF) and exit code 127, matching the old fork+exec behavior where fork always succeeded and exec failure manifested as a non-zero exit code. Add misc test to verify non-existent commands don't throw exceptions. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Assert output in process-nonexistent misc test, guard kill against pid=0 The test now prints stdout_empty, stderr_empty, and exit_code values and asserts them via compile.hxml.stdout (expected: empty pipes, code 127). Also add pid > 0 guard in Process.kill as defense-in-depth against sending signals to process group 0 for failed process objects. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Preserve Unix.Unix_error info in all Process operations - run: writes error message to stderr pipe so callers can read it - read_stdout/read_stderr/write_stdin/close_stdin: include the Unix error details (function, arg, message) in Failure exceptions - Add unix_error_msg helper shared by all operations - Test checks stderr is non-empty (not a specific message string) for cross-platform compatibility Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * reformat * avoid some code duplication --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Simn <634365+Simn@users.noreply.github.com> Co-authored-by: Simon Krajewski <simon@haxe.org>

Simn · 2026-03-07T08:20:00Z

Looks like the damn process thing really was the root cause of these weird internal issues we've been seeing. I'm not sure if this also resolves the memory leak but I suppose there's only one way to find out.

Copilot AI mentioned this pull request Mar 5, 2026

Enable parallelism again, fix pool usage, fix data races #12748

Closed

why...

ade2f8c

Simn force-pushed the why-is-parallelism-disabled branch from 38c78d6 to ade2f8c Compare March 5, 2026 20:57

add sctx.pool

a7bee8b

Simn force-pushed the why-is-parallelism-disabled branch from b123bb1 to a7bee8b Compare March 5, 2026 21:03

Copilot AI and others added 2 commits March 5, 2026 22:34

move apply_params_stack to uctx

bfc9961

Copilot AI mentioned this pull request Mar 5, 2026

Restore run_in_new_pool for filter operations to fix Eval/domain interference #12751

Closed

Simn mentioned this pull request Mar 6, 2026

Use Domain instead of Thread for Eval #12752

Draft

Copilot AI mentioned this pull request Mar 6, 2026

Fix DomainMutex data race, thread ID generation, TLS counter race, native eval Semaphore/Condition, and CamlinternalLazy.Undefined for Domain-based eval #12753

Closed

Copilot AI added a commit that referenced this pull request Mar 6, 2026

Revert #12747 changes, keep only eval Domain changes, restore TestTim…

a6d8fe0

…er test Co-authored-by: Simn <634365+Simn@users.noreply.github.com>

Simn added 3 commits March 6, 2026 15:13

Merge branch 'development' into why-is-parallelism-disabled

0f0cc0b

investigate if we're actually looking at the problem

40b345d

revert

3ab32fa

Simn added a commit that referenced this pull request Mar 6, 2026

pull generic parallelism fixes from #12747

e91640d

Simn added 3 commits March 6, 2026 15:37

Merge branch 'development' into why-is-parallelism-disabled

18de1c8

focus

b914dde

focus more

3caa538

Copilot AI mentioned this pull request Mar 6, 2026

Replace process_stubs.c with pure OCaml implementation for OCaml 5 multi-domain safety #12757

Merged

Copilot AI and others added 2 commits March 7, 2026 06:55

Merge branch 'development' into why-is-parallelism-disabled

3cab824

Simn marked this pull request as ready for review March 7, 2026 07:41

Simn merged commit cabfd42 into development Mar 7, 2026
99 checks passed

Simn deleted the why-is-parallelism-disabled branch March 7, 2026 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable parallelism again#12747

Enable parallelism again#12747
Simn merged 12 commits intodevelopmentfrom
why-is-parallelism-disabled

Simn commented Mar 5, 2026

Uh oh!

Simn commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Simn commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Simn commented Mar 6, 2026

Uh oh!

Copilot AI commented Mar 6, 2026

Uh oh!

Simn commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Simn commented Mar 5, 2026

Uh oh!

Simn commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Simn commented Mar 5, 2026

Uh oh!

Copilot AI commented Mar 5, 2026

Uh oh!

Simn commented Mar 6, 2026

Uh oh!

Copilot AI commented Mar 6, 2026

Uh oh!

Simn commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants