Mergifyio
diff --git a/‎docs/superpowers/plans/2026-03-19-xdist-flaky-detection.md‎
Lines changed: 1213 additions & 0 deletions b/‎docs/superpowers/plans/2026-03-19-xdist-flaky-detection.md‎
Lines changed: 1213 additions & 0 deletions
diff --git a/‎docs/superpowers/specs/2026-03-19-xdist-flaky-detection-design.md‎
Lines changed: 166 additions & 0 deletions b/‎docs/superpowers/specs/2026-03-19-xdist-flaky-detection-design.md‎
Lines changed: 166 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 0 deletions b/‎pyproject.toml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎pytest_mergify/__init__.py‎
Lines changed: 119 additions & 1 deletion b/‎pytest_mergify/__init__.py‎
Lines changed: 119 additions & 1 deletion
diff --git a/‎pytest_mergify/ci_insights.py‎
Lines changed: 16 additions & 0 deletions b/‎pytest_mergify/ci_insights.py‎
Lines changed: 16 additions & 0 deletions
@@ -0,0 +1,166 @@
+# Add pytest-xdist Support to Flaky Detection
+
+**Linear:** MRGFY-6296
+**Status:** Approved
+**Date:** 2026-03-19
+
+## Problem
+
+The flaky detection system does not support `pytest-xdist`:
+
+1. `flaky_detector._test_metrics` lives in-process memory, but xdist spawns separate worker processes.
+2. `pytest_collection_finish` does not run on the controller under xdist.
+
+## Decision Summary
+
+- **Approach:** Controller-orchestrated with pre-computed per-test deadlines.
+- **IPC:** xdist built-in `workerinput`/`workeroutput`.
+- **Budget model:** Global budget, static per-test allocation under xdist. Dynamic deadlines preserved for non-xdist.
+- **Scheduling:** Target `load` (default) mode. Other modes should not crash. Under `each` mode (every test runs on every worker), flaky detection is disabled to avoid duplicated budgets.
+
+## Architecture
+
+```
+Controller                          Workers (gw0, gw1, ...)
+────────────────────────────────    ────────────────────────────────
+fetch flaky context from API
+    │
+    ├─── workerinput ──────────►    receive context as plain dict
+    │                               build FlakyDetector (no API call)
+    │                               collect tests (same list)
+    │                               compute budget (same result)
+    │                               run tests + reruns
+    │                               ◄── workeroutput ───────────┤
+aggregate metrics
+print terminal summary
+```
+
+All workers collect the same full test list (xdist verifies this). Budget computation is deterministic, so each worker independently arrives at the same global budget and per-test allocation. No mid-run coordination.
+
+## Controller Responsibilities
+
+### 1. Fetch context and distribute (`pytest_configure_node`)
+
+- Fetch `_FlakyDetectionContext` from API **once** (cache it).
+- Serialize as plain dict into `node.workerinput["flaky_detection_context"]`.
+- Also set `node.workerinput["flaky_detection_mode"]`.
+
+### 2. Collect worker metrics (`pytest_testnodedown`)
+
+- Read `node.workeroutput["flaky_detection_metrics"]`.
+- Merge into controller-side aggregated metrics dict.
+- Workers run distinct tests under `load` scheduling, so no overlap.
+
+### 3. Terminal summary (`pytest_terminal_summary`)
+
+- Build report from aggregated metrics using same format as today.
+
+## Worker Responsibilities
+
+### 1. Initialization
+
+- Read `config.workerinput["flaky_detection_context"]` if present.
+- Construct `FlakyDetector` via new `from_context()` classmethod (skips API call).
+
+### 2. Session preparation (`pytest_collection_finish`)
+
+- Call `prepare_for_session(session)` as today.
+
+### 3. Test execution (`pytest_runtest_protocol`)
+
+- Identical to current logic: initial run, set deadline, rerun loop.
+- `set_test_deadline` uses static allocation: `total_budget / global_num_tests_to_process` where the denominator is the **global** count of tests to process (computed from the full collection, not from the worker's assigned subset). Workers don't know upfront which tests they'll run (xdist dispatches dynamically), but the per-test budget is the same regardless.
+
+### 4. Metrics export (`pytest_sessionfinish`)
+
+- Serialize `_test_metrics`, `_over_length_tests`, `_debug_logs` into `config.workeroutput["flaky_detection_metrics"]`.
+
+## Data Flow
+
+### workerinput (controller -> worker)
+
+```python
+node.workerinput["flaky_detection_context"] = {
+    "budget_ratio_for_new_tests": float,
+    "budget_ratio_for_unhealthy_tests": float,
+    "existing_test_names": list[str],
+    "existing_tests_mean_duration_ms": int,
+    "unhealthy_test_names": list[str],
+    "max_test_execution_count": int,
+    "max_test_name_length": int,
+    "min_budget_duration_ms": int,
+    "min_test_execution_count": int,
+}
+node.workerinput["flaky_detection_mode"] = "new" | "unhealthy"
+```
+
+### workeroutput (worker -> controller)
+
+```python
+config.workeroutput["flaky_detection_metrics"] = {
+    "test_metrics": {
+        "tests/test_foo.py::test_bar": {
+            "rerun_count": int,
+            "total_duration_ms": float,
+            "initial_setup_duration_ms": float,
+            "initial_call_duration_ms": float,
+            "initial_teardown_duration_ms": float,
+            "prevented_timeout": bool,
+        },
+    },
+    "over_length_tests": list[str],
+    "debug_logs": list[dict],
+}
+```
+
+The three initial duration sub-fields are needed because `make_report` uses `initial_duration` (their sum) and `is_test_too_slow` compares it against remaining time. Serializing them separately preserves full fidelity.
+
+## FlakyDetector Changes
+
+### New classmethod
+
+`FlakyDetector.from_context(context_dict, mode)` is a `@classmethod` that constructs a `FlakyDetector` from a serialized context dict, skipping `_fetch_context()`. It sets `token`, `url`, and `full_repository_name` to empty strings (the dataclass fields remain required, but these values are unused on workers). The `_context` field is populated directly from the dict.
+
+On the controller side, `FlakyDetector` is **not** instantiated. The controller only holds the raw context dict (for `workerinput`) and aggregated metrics (from `workeroutput`). The report is generated via `make_report_from_aggregated`, which is a standalone function that operates on plain dicts.
+
+### Deadline computation
+
+- **Non-xdist (unchanged):** Dynamic `remaining_budget / remaining_tests`.
+- **xdist:** Static `total_budget / num_tests_to_process`.
+
+Branch via a single `if` in `set_test_deadline`.
+
+### Report from aggregated data
+
+`make_report_from_aggregated(context, mode, metrics, over_length_tests, debug_logs)` runs on the controller from deserialized worker data.
+
+## Error Handling
+
+- **Worker crash:** `workeroutput` may be missing. Controller skips that worker's data and shows partial report.
+- **Context fetch fails:** No context sent to workers, workers skip flaky detection. Same as today.
+- **No context in workerinput:** Worker skips flaky detection gracefully.
+
+## Testing Strategy
+
+### Unit tests
+
+- `from_context()` construction from plain dict.
+- Static deadline computation.
+- `make_report_from_aggregated()` output from deserialized metrics.
+
+### Integration tests
+
+- `pytester` with `-n 2`: end-to-end flaky detection under xdist.
+- Metrics aggregation across workers (check terminal summary).
+- Budget respected across workers.
+
+### Edge cases
+
+- Single worker (`-n 1`).
+- Worker crash: partial report, no controller crash.
+- No tests to process.
+- xdist not installed: no import errors.
+
+### Regression
+
+All existing non-xdist tests must keep passing unchanged.
@@ -27,6 +27,7 @@ dev = [
     # https://github.com/pytest-dev/pytest-asyncio/issues/706#issuecomment-2082788963
     "pytest-asyncio>=0.24.0",
     "freezegun>=1.5.5",
+    "pytest-xdist>=3.0",
 ]
 
 [build-system]
 
@@ -1,3 +1,4 @@
+import dataclasses
 import datetime
 import os
 import platform
@@ -33,6 +34,70 @@ def pytest_configure(self, config: _pytest.config.Config) -> None:
             kwargs["api_url"] = api_url
         self.mergify_ci = MergifyCIInsights(**kwargs)
 
+        # xdist controller state.
+        self._xdist_flaky_context: typing.Optional[typing.Dict[str, typing.Any]] = None
+        self._xdist_flaky_mode: typing.Optional[str] = None
+        self._xdist_aggregated_metrics: typing.Dict[str, typing.Any] = {
+            "test_metrics": {},
+            "over_length_tests": [],
+            "debug_logs": [],
+        }
+        self._xdist_available_budget_duration_ms: float = 0.0
+
+        # On xdist controller, reuse the already-loaded detector's context
+        # for distribution to workers. No extra API call needed since
+        # MergifyCIInsights.__post_init__ already calls _load_flaky_detector().
+        if _is_xdist_controller(config) and self.mergify_ci.flaky_detector:
+            self._xdist_flaky_context = dataclasses.asdict(
+                self.mergify_ci.flaky_detector._context
+            )
+            self._xdist_flaky_mode = self.mergify_ci.flaky_detector.mode
+
+        # xdist worker: load flaky detector from controller-provided context.
+        if _is_xdist_worker(config):
+            workerinput = getattr(config, "workerinput", {})
+            context = workerinput.get("flaky_detection_context")
+            mode = workerinput.get("flaky_detection_mode")
+            if context is not None and mode is not None:
+                self.mergify_ci.load_flaky_detector_from_context(context, mode)
+
+    def pytest_configure_node(self, node: typing.Any) -> None:
+        """xdist hook: distribute flaky detection context to workers."""
+        # Disable under 'each' mode to avoid duplicated budgets.
+        dist_mode = getattr(node.config.option, "dist", None)
+        if dist_mode == "each":
+            return
+
+        if self._xdist_flaky_context is not None:
+            node.workerinput["flaky_detection_context"] = self._xdist_flaky_context
+            node.workerinput["flaky_detection_mode"] = self._xdist_flaky_mode
+
+    def pytest_testnodedown(self, node: typing.Any, error: typing.Any) -> None:
+        """xdist hook: collect metrics from completed workers."""
+        workeroutput = getattr(node, "workeroutput", None)
+        if workeroutput is None:
+            return
+
+        worker_metrics = workeroutput.get("flaky_detection_metrics")
+        if worker_metrics is None:
+            return
+
+        # Merge test metrics (workers run distinct tests, no overlap).
+        self._xdist_aggregated_metrics["test_metrics"].update(
+            worker_metrics.get("test_metrics", {})
+        )
+        self._xdist_aggregated_metrics["over_length_tests"].extend(
+            worker_metrics.get("over_length_tests", [])
+        )
+        self._xdist_aggregated_metrics["debug_logs"].extend(
+            worker_metrics.get("debug_logs", [])
+        )
+
+        if "available_budget_duration_ms" in worker_metrics:
+            self._xdist_available_budget_duration_ms = worker_metrics[
+                "available_budget_duration_ms"
+            ]
+
     def pytest_terminal_summary(
         self, terminalreporter: _pytest.terminal.TerminalReporter
     ) -> None:
@@ -56,7 +121,37 @@ def pytest_terminal_summary(
             )
             return
 
-        if self.mergify_ci.flaky_detector:
+        if _is_xdist_controller(terminalreporter.config):
+            if self._xdist_flaky_context:
+                # Always show report (even if no test_metrics — shows "No new tests detected").
+                from pytest_mergify import flaky_detection
+
+                mode: typing.Literal["new", "unhealthy"] = (
+                    self._xdist_flaky_mode  # type: ignore[assignment]
+                    if self._xdist_flaky_mode in ("new", "unhealthy")
+                    else "new"
+                )
+                terminalreporter.write_line(
+                    flaky_detection.make_report_from_aggregated(
+                        context_dict=self._xdist_flaky_context,
+                        mode=mode,
+                        available_budget_duration_ms=self._xdist_available_budget_duration_ms,
+                        aggregated_metrics=self._xdist_aggregated_metrics,
+                    )
+                )
+            elif self.mergify_ci.flaky_detector_error_message:
+                terminalreporter.write_line(
+                    f"""⚠️ Flaky detection couldn't be enabled because of an error.
+
+Common issues:
+  • Your 'MERGIFY_TOKEN' might not be set or could be invalid
+  • There might be a network connectivity issue with the Mergify API
+
+📚 Documentation: https://docs.mergify.com/ci-insights/test-frameworks/pytest/
+🔍 Details: {self.mergify_ci.flaky_detector_error_message}""",
+                    yellow=True,
+                )
+        elif self.mergify_ci.flaky_detector:
             terminalreporter.write_line(self.mergify_ci.flaky_detector.make_report())
         elif self.mergify_ci.flaky_detector_error_message:
             terminalreporter.write_line(
@@ -147,6 +242,17 @@ def pytest_sessionfinish(
         self,
         session: _pytest.main.Session,
     ) -> typing.Generator[None, None, None]:
+        # xdist worker: export metrics via workeroutput (independent of tracer).
+        if _is_xdist_worker(session.config) and self.mergify_ci.flaky_detector:
+            workeroutput = getattr(session.config, "workeroutput", None)
+            if workeroutput is not None:
+                metrics = self.mergify_ci.flaky_detector.to_serializable_metrics()
+                metrics["available_budget_duration_ms"] = (
+                    self.mergify_ci.flaky_detector._available_budget_duration.total_seconds()
+                    * 1000
+                )
+                workeroutput["flaky_detection_metrics"] = metrics
+
         if not self.tracer:
             yield
             return
@@ -462,3 +568,15 @@ def _should_skip_item(item: _pytest.nodes.Item) -> bool:
 
     # nosemgrep: python.lang.security.audit.eval-detected.eval-detected
     return bool(eval(condition_code, globals_))
+
+
+def _is_xdist_controller(config: _pytest.config.Config) -> bool:
+    """Check if running as xdist controller (not a worker)."""
+    return config.pluginmanager.has_plugin("dsession") and not hasattr(
+        config, "workerinput"
+    )
+
+
+def _is_xdist_worker(config: _pytest.config.Config) -> bool:
+    """Check if running as xdist worker."""
+    return hasattr(config, "workerinput")
@@ -195,6 +195,22 @@ def _load_flaky_detector(self) -> None:
                 f"Could not load flaky detector: {str(exception)}"
             )
 
+    def load_flaky_detector_from_context(
+        self,
+        context_dict: typing.Dict[str, typing.Any],
+        mode: typing.Literal["new", "unhealthy"],
+    ) -> None:
+        """Construct FlakyDetector from pre-fetched context (xdist worker path)."""
+        try:
+            self.flaky_detector = flaky_detection.FlakyDetector.from_context(
+                context_dict=context_dict,
+                mode=mode,
+            )
+        except Exception as exception:
+            self.flaky_detector_error_message = (
+                f"Could not load flaky detector: {str(exception)}"
+            )
+
     def mark_test_as_quarantined_if_needed(self, item: _pytest.nodes.Item) -> bool:
         """
         Returns `True` if the test was marked as quarantined, otherwise returns `False`.
Original file line number	Diff line number	Diff line change
`@@ -27,6 +27,7 @@ dev = [`
`27`	`27`	`# https://github.com/pytest-dev/pytest-asyncio/issues/706#issuecomment-2082788963`
`28`	`28`	`"pytest-asyncio>=0.24.0",`
`29`	`29`	`"freezegun>=1.5.5",`
	`30`	`+ "pytest-xdist>=3.0",`
`30`	`31`	`]`
`31`	`32`
`32`	`33`	`[build-system]`