Skip to content

Commit 145f31c

Browse files
remyduthuclaude
andcommitted
feat(flaky-detection): Add pytest-xdist support to flaky detection system
Add full flaky detection support under pytest-xdist using controller- orchestrated pre-computed deadlines and xdist's built-in IPC. Architecture: - Controller fetches flaky detection context from API once, distributes it to workers via workerinput - Each worker constructs a FlakyDetector from the serialized context, computes the same global budget, runs tests with reruns - Workers send metrics back via workeroutput - Controller aggregates metrics and generates the terminal report Changes: - Add FlakyDetector.from_context() classmethod for worker-side construction - Add to_serializable_metrics() for xdist IPC (plain builtins only) - Add make_report_from_aggregated() for controller-side reporting - Add static deadline mode (_is_xdist flag) preserving dynamic for non-xdist - Add MergifyCIInsights.load_flaky_detector_from_context() - Add controller hooks (pytest_configure_node, pytest_testnodedown) - Add worker hooks (workerinput loading, workeroutput export) - Disable flaky detection under 'each' scheduling mode - Add pytest-xdist dev dependency and integration tests Fixes: MRGFY-6296 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Change-Id: I2811d2999ee3b51fb4649e2f60c33c200635435b
1 parent ecdb273 commit 145f31c

File tree

9 files changed

+2192
-140
lines changed

9 files changed

+2192
-140
lines changed

docs/superpowers/plans/2026-03-19-xdist-flaky-detection.md

Lines changed: 1213 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# Add pytest-xdist Support to Flaky Detection
2+
3+
**Linear:** MRGFY-6296
4+
**Status:** Approved
5+
**Date:** 2026-03-19
6+
7+
## Problem
8+
9+
The flaky detection system does not support `pytest-xdist`:
10+
11+
1. `flaky_detector._test_metrics` lives in-process memory, but xdist spawns separate worker processes.
12+
2. `pytest_collection_finish` does not run on the controller under xdist.
13+
14+
## Decision Summary
15+
16+
- **Approach:** Controller-orchestrated with pre-computed per-test deadlines.
17+
- **IPC:** xdist built-in `workerinput`/`workeroutput`.
18+
- **Budget model:** Global budget, static per-test allocation under xdist. Dynamic deadlines preserved for non-xdist.
19+
- **Scheduling:** Target `load` (default) mode. Other modes should not crash. Under `each` mode (every test runs on every worker), flaky detection is disabled to avoid duplicated budgets.
20+
21+
## Architecture
22+
23+
```
24+
Controller Workers (gw0, gw1, ...)
25+
──────────────────────────────── ────────────────────────────────
26+
fetch flaky context from API
27+
28+
├─── workerinput ──────────► receive context as plain dict
29+
│ build FlakyDetector (no API call)
30+
│ collect tests (same list)
31+
│ compute budget (same result)
32+
│ run tests + reruns
33+
│ ◄── workeroutput ───────────┤
34+
aggregate metrics
35+
print terminal summary
36+
```
37+
38+
All workers collect the same full test list (xdist verifies this). Budget computation is deterministic, so each worker independently arrives at the same global budget and per-test allocation. No mid-run coordination.
39+
40+
## Controller Responsibilities
41+
42+
### 1. Fetch context and distribute (`pytest_configure_node`)
43+
44+
- Fetch `_FlakyDetectionContext` from API **once** (cache it).
45+
- Serialize as plain dict into `node.workerinput["flaky_detection_context"]`.
46+
- Also set `node.workerinput["flaky_detection_mode"]`.
47+
48+
### 2. Collect worker metrics (`pytest_testnodedown`)
49+
50+
- Read `node.workeroutput["flaky_detection_metrics"]`.
51+
- Merge into controller-side aggregated metrics dict.
52+
- Workers run distinct tests under `load` scheduling, so no overlap.
53+
54+
### 3. Terminal summary (`pytest_terminal_summary`)
55+
56+
- Build report from aggregated metrics using same format as today.
57+
58+
## Worker Responsibilities
59+
60+
### 1. Initialization
61+
62+
- Read `config.workerinput["flaky_detection_context"]` if present.
63+
- Construct `FlakyDetector` via new `from_context()` classmethod (skips API call).
64+
65+
### 2. Session preparation (`pytest_collection_finish`)
66+
67+
- Call `prepare_for_session(session)` as today.
68+
69+
### 3. Test execution (`pytest_runtest_protocol`)
70+
71+
- Identical to current logic: initial run, set deadline, rerun loop.
72+
- `set_test_deadline` uses static allocation: `total_budget / global_num_tests_to_process` where the denominator is the **global** count of tests to process (computed from the full collection, not from the worker's assigned subset). Workers don't know upfront which tests they'll run (xdist dispatches dynamically), but the per-test budget is the same regardless.
73+
74+
### 4. Metrics export (`pytest_sessionfinish`)
75+
76+
- Serialize `_test_metrics`, `_over_length_tests`, `_debug_logs` into `config.workeroutput["flaky_detection_metrics"]`.
77+
78+
## Data Flow
79+
80+
### workerinput (controller -> worker)
81+
82+
```python
83+
node.workerinput["flaky_detection_context"] = {
84+
"budget_ratio_for_new_tests": float,
85+
"budget_ratio_for_unhealthy_tests": float,
86+
"existing_test_names": list[str],
87+
"existing_tests_mean_duration_ms": int,
88+
"unhealthy_test_names": list[str],
89+
"max_test_execution_count": int,
90+
"max_test_name_length": int,
91+
"min_budget_duration_ms": int,
92+
"min_test_execution_count": int,
93+
}
94+
node.workerinput["flaky_detection_mode"] = "new" | "unhealthy"
95+
```
96+
97+
### workeroutput (worker -> controller)
98+
99+
```python
100+
config.workeroutput["flaky_detection_metrics"] = {
101+
"test_metrics": {
102+
"tests/test_foo.py::test_bar": {
103+
"rerun_count": int,
104+
"total_duration_ms": float,
105+
"initial_setup_duration_ms": float,
106+
"initial_call_duration_ms": float,
107+
"initial_teardown_duration_ms": float,
108+
"prevented_timeout": bool,
109+
},
110+
},
111+
"over_length_tests": list[str],
112+
"debug_logs": list[dict],
113+
}
114+
```
115+
116+
The three initial duration sub-fields are needed because `make_report` uses `initial_duration` (their sum) and `is_test_too_slow` compares it against remaining time. Serializing them separately preserves full fidelity.
117+
118+
## FlakyDetector Changes
119+
120+
### New classmethod
121+
122+
`FlakyDetector.from_context(context_dict, mode)` is a `@classmethod` that constructs a `FlakyDetector` from a serialized context dict, skipping `_fetch_context()`. It sets `token`, `url`, and `full_repository_name` to empty strings (the dataclass fields remain required, but these values are unused on workers). The `_context` field is populated directly from the dict.
123+
124+
On the controller side, `FlakyDetector` is **not** instantiated. The controller only holds the raw context dict (for `workerinput`) and aggregated metrics (from `workeroutput`). The report is generated via `make_report_from_aggregated`, which is a standalone function that operates on plain dicts.
125+
126+
### Deadline computation
127+
128+
- **Non-xdist (unchanged):** Dynamic `remaining_budget / remaining_tests`.
129+
- **xdist:** Static `total_budget / num_tests_to_process`.
130+
131+
Branch via a single `if` in `set_test_deadline`.
132+
133+
### Report from aggregated data
134+
135+
`make_report_from_aggregated(context, mode, metrics, over_length_tests, debug_logs)` runs on the controller from deserialized worker data.
136+
137+
## Error Handling
138+
139+
- **Worker crash:** `workeroutput` may be missing. Controller skips that worker's data and shows partial report.
140+
- **Context fetch fails:** No context sent to workers, workers skip flaky detection. Same as today.
141+
- **No context in workerinput:** Worker skips flaky detection gracefully.
142+
143+
## Testing Strategy
144+
145+
### Unit tests
146+
147+
- `from_context()` construction from plain dict.
148+
- Static deadline computation.
149+
- `make_report_from_aggregated()` output from deserialized metrics.
150+
151+
### Integration tests
152+
153+
- `pytester` with `-n 2`: end-to-end flaky detection under xdist.
154+
- Metrics aggregation across workers (check terminal summary).
155+
- Budget respected across workers.
156+
157+
### Edge cases
158+
159+
- Single worker (`-n 1`).
160+
- Worker crash: partial report, no controller crash.
161+
- No tests to process.
162+
- xdist not installed: no import errors.
163+
164+
### Regression
165+
166+
All existing non-xdist tests must keep passing unchanged.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ dev = [
2727
# https://github.com/pytest-dev/pytest-asyncio/issues/706#issuecomment-2082788963
2828
"pytest-asyncio>=0.24.0",
2929
"freezegun>=1.5.5",
30+
"pytest-xdist>=3.0",
3031
]
3132

3233
[build-system]

0 commit comments

Comments
 (0)