Skip to content

Worker processes crash on Manager proxy access after hub completes full exploration pass #246

@resoltico

Description

@resoltico

Hi!

It seems there may be a bug in FuzzWorkerHub.start() (hypofuzz.py) that causes worker processes to crash with an unhandled exception when HypoFuzz finishes a full exploration pass — i.e., when it prints "Found a failing input for every test!" and exits.

What is observed

When all tests have been exhausted, HypoFuzz exits with code 1 and the worker subprocess (Process-2) prints an unhandled traceback. The exception class differs by Python version:

Python 3.13.8 — BrokenPipeError:

File "…/hypofuzz/hypofuzz.py", line 694, in start
worker_state["expected_lifetime"] = None
File "…/multiprocessing/managers.py", line 830, in _callmethod
conn.send((self._id, methodname, args, kwds))

BrokenPipeError: [Errno 32] Broken pipe
Found a failing input for every test!

Python 3.14.3 — FileNotFoundError:

File "…/hypofuzz/hypofuzz.py", line 610, in start
self._update_targets(self.shared_state["hub_state"]["nodeids"])
File "…/multiprocessing/managers.py", line 832, in _callmethod
kind, result = conn.recv()

File "…/multiprocessing/managers.py", line 863, in _incref
conn = self._Client(self._token.address, authkey=self._authkey)

FileNotFoundError: [Errno 2] No such file or directory
Found a failing input for every test!

Both versions reproduce. The traceback always originates from _start_worker → FuzzWorker.start() accessing a Manager proxy object.

Why this might be happening

Looking at FuzzWorkerHub.start(), the poll loop breaks when all workers report empty valid_nodeids, and control then exits the with Manager() as manager: block. Our reading of the code suggests that Manager.exit() fires at that point — closing the IPC socket — while the worker processes may still be running and attempting to access shared state through proxy objects. That timing gap is what we suspect is behind the crash, though we may be missing something and invite the maintainers to look more carefully.

The difference in exception class between Python versions (write failure on 3.13 vs. connect failure on 3.14) may reflect a change in how Manager.exit() cleans up its socket between the two releases, but we are not certain of that either.

Reproduction

Reproduces in ~65 seconds with a single always-failing test. Attached is a minimal two-file reproduction:

  • test_repro.py — a Hypothesis test that always raises, so HypoFuzz exhausts valid_nodeids on the first hub check cycle (after its 60-second sleep)
  • run_repro.sh — runs HypoFuzz against the test on both Python versions and reports the exception class observed on each

Environment

  • HypoFuzz: 25.11.01
  • Hypothesis: 6.148.7 (Python 3.13) / 6.151.9 (Python 3.14)
  • Python: 3.13.8 and 3.14.3
  • OS: macOS (Apple Silicon)

test_repro.py

run_repro.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugsomething is clearly wrong here

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions