Conversation
|
@copilot Please survey all the |
38c78d6 to
ade2f8c
Compare
b123bb1 to
a7bee8b
Compare
- Add Parallel.run_with_pool to wrap operations in Task.run (fixes Unhandled(Wait) exception from missing effect handler) - Update all call sites to use run_with_pool instead of bare Lazy.force - Enable parallel analyzer by default (remove opt-in gate) - Make apply_params_stack domain-local via Domain.DLS (fixes data race) - Make alloc_mid use Atomic (consistency with alloc_var) - Protect DCE CfMaybeUsed writes with field_marker_mutex (fixes flag race) Co-authored-by: Simn <634365+Simn@users.noreply.github.com>
|
@copilot I remembered (well, CI reminded me) that there was still a problem with domains and Eval. There's a comment about this in filters.ml: Unfortunately, I don't remember what the actual root cause was and why I concluded that there's an interference. Mixing threads and domain is probably not a particularly good idea, but at the same time I can't tell why it wouldn't work. Could you check if you can identify the problem? And while you're there, I would like to have your assessment on whether or not it would be feasible to make Eval use domains instead of threads for its |
…er test Co-authored-by: Simn <634365+Simn@users.noreply.github.com>
|
@copilot This PR aims to enable parallelism for the filtering stage of the compiler again. We have found that there is a problem with the eval target in particular which we haven't been able to track down yet. Here are examples of failing CI runs: https://github.com/HaxeFoundation/haxe/actions/runs/22768814450/job/66045252148?pr=12747 Note: the actual Eval run-time is not affected by parallelism and still runs single-threaded. Also, note that these particular failures never occured on the development branch, so they really do seem related to the changes here. |
…lti-domain safety (#12757) * Initial plan * Use run_in_new_pool for filter phases to avoid idle domains during eval execution The persistent pool (via run_with_pool) kept domain workers alive during eval interpretation and callbacks, causing subprocess/IPC failures in eval tests. Revert to run_in_new_pool which tears down the pool after each parallel section, matching the pattern that works on development. The sctx.pool and run_with_pool remain available for generators (e.g. genjvm) that use them during code generation. Also restore the EnableParallelAnalyzer guard since the analyzer was specifically identified as leaking when run in parallel (#12414). Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Revert filters.ml to base branch state Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Make process_stubs.c safe for OCaml 5 multi-domain programs Two changes to fix eval subprocess failures when domainslib pool domains are alive: 1. Add caml_enter_blocking_section/caml_leave_blocking_section to all blocking process operations (read, write, waitpid, WaitForSingleObject, ReadFile, WriteFile). Without these, the calling domain holds its runtime lock during blocking I/O, preventing stop-the-world GC from proceeding when worker domains need it. This causes GC stalls that lead to pipe I/O corruption and timing issues. 2. Replace fork()+exec() with posix_spawn() on Unix. OCaml 5's Unix.fork explicitly refuses to fork when domains are spawned (caml_domain_is_multicore check). process_stubs.c bypassed this safety check by calling fork() directly. In multi-threaded programs, fork() can deadlock when worker threads hold internal locks (malloc arenas, glibc internals). posix_spawn() avoids this by using clone(CLONE_VM|CLONE_VFORK) internally on Linux. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Replace process_stubs.c with pure OCaml implementation Remove the C stubs entirely and implement Process using Unix.create_process, which is domain-safe under OCaml 5. This also refactors PipeThings.run_command to use Process.run instead of duplicating the pipe+spawn logic. The old C stubs had two issues in OCaml 5: 1. Missing caml_enter_blocking_section around blocking I/O 2. Direct fork() which is unsafe with multiple domains Unix.create_process handles both correctly (it uses posix_spawn internally on modern systems and properly releases the runtime lock). Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Fix error handling: convert Unix.Unix_error to Failure for eval compatibility The old C stubs used caml_failwith for all errors, which eval catches as Dynamic. Unix.create_process raises Unix.Unix_error instead, which eval couldn't catch. Convert all Unix errors to Failure to preserve backward compatibility. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * catch Unix_error in callers instead * don't turn exit status into exceptions * make Eof extend Exception * catch constructor exception too * back to running everything * hmm * Handle non-existent commands in Process.run without throwing When Unix.create_process fails (e.g. command not found), return a process with pre-closed pipes (reads return EOF) and exit code 127, matching the old fork+exec behavior where fork always succeeded and exec failure manifested as a non-zero exit code. Add misc test to verify non-existent commands don't throw exceptions. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Assert output in process-nonexistent misc test, guard kill against pid=0 The test now prints stdout_empty, stderr_empty, and exit_code values and asserts them via compile.hxml.stdout (expected: empty pipes, code 127). Also add pid > 0 guard in Process.kill as defense-in-depth against sending signals to process group 0 for failed process objects. Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * Preserve Unix.Unix_error info in all Process operations - run: writes error message to stderr pipe so callers can read it - read_stdout/read_stderr/write_stdin/close_stdin: include the Unix error details (function, arg, message) in Failure exceptions - Add unix_error_msg helper shared by all operations - Test checks stderr is non-empty (not a specific message string) for cross-platform compatibility Co-authored-by: Simn <634365+Simn@users.noreply.github.com> * reformat * avoid some code duplication --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Simn <634365+Simn@users.noreply.github.com> Co-authored-by: Simon Krajewski <simon@haxe.org>
|
Looks like the damn process thing really was the root cause of these weird internal issues we've been seeing. I'm not sure if this also resolves the memory leak but I suppose there's only one way to find out. |
This was randomly disabled in some ninja commit last year and I never noticed it. We even added
-D disable-parallelismto optionally disable it, so I'm a little confused by it's disabled by default. But either way, let's get this working properly.