Skip to content

test(web): eliminate timeout flakiness in the web test suite#1597

Merged
cliffhall merged 1 commit into
1579-wave-1from
v2/1596-test-stability
Jul 2, 2026
Merged

test(web): eliminate timeout flakiness in the web test suite#1597
cliffhall merged 1 commit into
1579-wave-1from
v2/1596-test-stability

Conversation

@cliffhall

Copy link
Copy Markdown
Member

Closes #1596

What

Deterministically fixes the recurring 5s-timeout flakes in the clients/web suite that surfaced under v8-instrumented, concurrent load. Test-only — no product source changed. No test was skipped/disabled, no global timeout was inflated, and the per-file coverage gate (≥90 on all four dimensions) still holds for every touched file.

Per-test root cause + fix

Common root cause (modals/forms/screens): userEvent.setup() schedules a real setTimeout between keystrokes. Under CPU contention those event-loop yields balloon, so multi-field interactions accumulated past the 5s per-test timeout — exactly the "clustered at the default 5s timeout under load" symptom. Fix: userEvent.setup({ delay: null }) removes the per-keystroke real-timer dependence (keystrokes dispatch synchronously), making the suites load-independent.

  • ServerConfigModal, ServerImportConfigModal, ServerImportJsonModal, ResourcesScreen, InspectorView (localStorage/History): delay: null on every userEvent.setup(). The debounced validation in the import modals was already awaited via findBy/waitFor, so it needed no change beyond fast typing.
  • PromptArgumentsForm (completions): the tests slept on the wall clock just past the 300ms completion debounce (await new Promise(r => setTimeout(r, 400))), which races the debounce timer under load. Replaced every such sleep with awaited conditions (findBy/waitFor) on the real rendered outcome plus delay: null; they now resolve as soon as the debounce fires and the response settles, and never depend on machine speed. Negative assertions (stale-option clearing, cancelled debounce) rely on the component's synchronous state/timer teardown rather than a timed window, so they're deterministic. A scoped 3s waitFor timeout (commented) covers the legitimate debounce + async settle under heavy load.
  • inspectorClient integration ("should track stderr logs for stdio transport"): the child process's stderr is piped out-of-band from the tool's JSON-RPC response, so reading getStderrLogs() synchronously right after callTool resolved raced the stderr chunk. Wrapped the assertion in vi.waitFor so it polls until the line lands.

Verification

All green with zero intermittent failures:

  • npm run validate (web) — 191 files / 2438 tests.
  • npm run test:coverage (web) — run : 4 sequential + 2 concurrent under load (two full coverage runs in parallel, the issue's suggested load harness). Every run: 229 files / 3237 tests passed, coverage gate clean (0 threshold errors).
  • npm run test:integration (web) standalone — 38 files / 799 tests.

🤖 Generated with Claude Code

https://claude.ai/code/session_01S3fTN8H3R8YV4yUGvZjYnX

@cliffhall cliffhall added the v2 Issues and PRs for v2 label Jul 2, 2026
@cliffhall

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cliffhall's task in 2m 29s —— View job


Code Review

  • Read the diff against origin/v2/main
  • Analyze each test file change for correctness
  • Check the integration test fix
  • Post review feedback

I reviewed all 7 changed files against origin/v2/main. This is a clean, well-reasoned test-only PR — no product source is touched, nothing is skipped, and no global timeout is inflated. The root-cause analysis (real setTimeout between keystrokes from userEvent.setup() ballooning under CPU contention) is accurate, and delay: null is the correct, idiomatic fix. The PromptArgumentsForm rewrite from wall-clock sleeps to awaited conditions (findBy/waitFor) is the right direction and removes the machine-speed dependence. The vi.waitFor wrapper on the stderr integration assertion correctly addresses the out-of-band pipe race.

I have one substantive observation and a couple of minor notes. Nothing blocking.


🟡 The "cancels a pending debounce timer" test now passes trivially

PromptArgumentsForm.test.tsx (~L508–545). The revised test removes the await new Promise(r => setTimeout(r, 400)) and asserts synchronously:

await user.type(textInput, "h");   // schedules a 300ms debounce for "text"
await user.click(siblingInput);
await user.click(textInput);        // handleFocus clears the pending timer + fires immediately
onCompleteArgument.mockClear();
const textCalls = onCompleteArgument.mock.calls.filter(([n]) => n === "text");
expect(textCalls.length).toBe(0);

In the source, the keystroke debounce calls onCompleteArgument at fire time — inside the setTimeout callback (PromptArgumentsForm.tsx:159–164, fire at :101). Because typing is now synchronous (delay: null) and the assertion runs immediately, we are checking at t≈0, well before the 300ms debounce could ever fire. So the count is 0 whether or not clearTimeout in handleFocus (:175) actually works — a broken cancellation would still pass. The test name promises it verifies cancellation, but it no longer can.

(Note: the old assertion textCalls.every(([, v]) => v === "h") was itself effectively vacuous — a stale fire would also carry "h", so every stayed true. So this isn't a regression in real coverage, but it's a missed opportunity to make the test actually load-bearing.)

The deterministic way to keep this test meaningful is fake timers — advance past the debounce and assert nothing fired, no wall-clock dependence:

vi.useFakeTimers();
const user = userEvent.setup({ delay: null, advanceTimers: vi.advanceTimersByTime });
// … type "h", click away, click back, mockClear() …
await vi.advanceTimersByTimeAsync(400);   // debounce window elapses
expect(onCompleteArgument.mock.calls.filter(([n]) => n === "text").length).toBe(0);
vi.useRealTimers();

Fix this →

The same reasoning applies mildly to the negative assertions in "clears stale dropdown options" and "does not call onCompleteArgument when completions are unsupported" — but those genuinely test synchronous teardown / never-scheduled paths, so asserting immediately is correct and deterministic there. Only the cancellation test relies on a timer that would need to elapse.


Minor notes (non-blocking)

  • Stale comment in "captures sibling values at fire time" (~L353–381): the comment says the captured sibling "may be "e" or "es" depending on where the fire lands relative to the keystrokes." With delay: null both keystrokes are now synchronous and complete long before the 300ms debounce, so it will reliably be "es". The /^es?$/ regex still accepts both, so the test is correct and robust — just the comment's rationale is now slightly stale.
  • SETTLE = { timeout: 3000 } is a good scoped ceiling. Consider a one-line note that 3s > the 300ms debounce + settle but stays under the 5s per-test default, which is the whole point — but the existing top-of-file comment already covers this well.
  • The vi.waitFor in inspectorClient.test.ts uses { timeout: 5000, interval: 25 } — sensible for an out-of-band stderr chunk. Good.

Overall: 👍 The mechanical delay: null changes across ServerConfigModal, ServerImportConfigModal, ServerImportJsonModal, ResourcesScreen, and InspectorView are all safe (the import modals' debounced validation was already awaited via findBy/waitFor, confirmed no awaits were dropped). The one item above is worth tightening so the cancellation test earns its name, but it isn't a correctness risk in the product and doesn't block the flakiness fix.
· v2/1596-test-stability

@cliffhall

Copy link
Copy Markdown
Member Author

@claude review

@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Claude finished @cliffhall's task in 1m 16s —— View job


Code Review

  • Read the diff against origin/v2/main
  • Analyze each test file change for correctness
  • Verify the round-1 fix (debounce-cancellation test)
  • Check the integration test fix

This review covers both commits, including dd781206 which addresses the substantive point from round 1. This remains a clean, test-only PR — no product source touched, nothing skipped, no global timeout inflated. 👍


✅ Round-1 point resolved — the cancellation test is now load-bearing

PromptArgumentsForm.test.tsx "cancels a pending debounce timer when the input is re-focused" now uses vi.useFakeTimers() + fireEvent, schedules the debounce, re-focuses to trigger handleFocus's clearTimeout (PromptArgumentsForm.tsx:169–173), mockClear()s the focus-fire calls, then advances past the 300ms window and asserts textCalls.length === 0. Traced against the source: if the clearTimeout in handleFocus were removed, the timer would fire during advanceTimersByTimeAsync(400) and the assertion would fail — so the test now genuinely verifies cancellation, exactly as its name promises. The try/finally restoring real timers and the note about userEvent deadlocking under fake timers (hence fireEvent) are both correct and well-documented.

✅ The remaining PromptArgumentsForm rewrites are sound

  • "captures sibling values at fire time" — with delay: null the sibling "es" is typed synchronously after the "text" debounce is scheduled but before it fires; buildContext reads argumentValuesRef.current at fire time (:163), so the test still verifies fire-time capture. The /^es?$/ tolerance is harmless.
  • The negative assertions ("clears stale dropdown options", "unsupported") assert on synchronous teardown / never-scheduled paths, so asserting immediately is correct and deterministic — no fake timers needed there.
  • "aborts an in-flight request" correctly gates each user.type on a waitFor that the prior request is actually in flight before the next keystroke, replacing the racy setTimeout(350).

✅ Mechanical delay: null changes are pure

Confirmed via diff filtering that ServerConfigModal, ServerImportConfigModal, ServerImportJsonModal, ResourcesScreen, and InspectorView contain only userEvent.setup()userEvent.setup({ delay: null }) — no awaits dropped, no assertions altered. The import modals' debounced validation was already awaited via findBy/waitFor, so fast typing is safe.

✅ Integration stderr fix is correct

inspectorClient.test.ts wraps the getStderrLogs() assertion in vi.waitFor({ timeout: 5000, interval: 25 }) — the right fix for the out-of-band stderr pipe that can lag the JSON-RPC response. Real timers are in effect there, so vi.waitFor polls normally.


Verdict: No blocking issues, and nothing worth changing. The round-1 concern is fully addressed and the fix is verifiably load-bearing. Approving in spirit (I can't submit a formal GitHub approval). Nice, careful work.
· v2/1596-test-stability

@cliffhall

Copy link
Copy Markdown
Member Author

📋 Smoke test & AGENTS.md audit for this PR: #1596 (comment) (posted on the linked issue #1596).

@cliffhall cliffhall changed the base branch from v2/main to 1579-wave-1 July 2, 2026 16:46
@cliffhall cliffhall merged commit 747148f into 1579-wave-1 Jul 2, 2026
1 check passed
@cliffhall cliffhall deleted the v2/1596-test-stability branch July 2, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v2 Issues and PRs for v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test-stability: eliminate timeout flakiness in the web test suite (zero-flake requirement)

1 participant