|
| 1 | +# Bug: Flaky Integration Test Timeouts Under Parallel Execution |
| 2 | + |
| 3 | +**Project:** freundebuch2 |
| 4 | +**Type:** Bug |
| 5 | +**Related Epic:** None |
| 6 | +**Phase:** Phase 1.5 (Post-MVP Enhancement) |
| 7 | +**Priority:** Low |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Summary |
| 12 | + |
| 13 | +Two integration tests in `apps/backend/tests/integration/auth-edge-cases.test.ts` intermittently time out (5000ms default) when the full integration test suite runs in parallel, but pass reliably when run in isolation. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Observed Behavior |
| 18 | + |
| 19 | +During the pre-push hook (`pnpm --filter backend test`), the following tests fail with `Error: Test timed out in 5000ms`: |
| 20 | + |
| 21 | +1. **"should handle concurrent sign-in attempts"** (line 460) — signs up a user, then fires 10 concurrent sign-in requests. Each sign-in involves bcrypt password verification, which is CPU-intensive by design. |
| 22 | +2. **"should set session cookie on successful sign-in"** (line 524) — signs up a user, then signs in once. Simple test, but likely starved of resources when other suites run concurrently. |
| 23 | + |
| 24 | +When run in isolation (`pnpm vitest run tests/integration/auth-edge-cases.test.ts`), both pass comfortably: |
| 25 | +- Concurrent sign-in: ~2100ms |
| 26 | +- Session cookie: ~400ms |
| 27 | + |
| 28 | +## Root Cause |
| 29 | + |
| 30 | +The backend Vitest config (`apps/backend/vitest.config.ts`) has no `fileParallelism`, `pool`, or `sequence` configuration. Vitest defaults to running test files in parallel using a thread pool. Each integration test suite spins up its own Testcontainers PostgreSQL instance and Better Auth stack, so parallel execution creates significant resource contention: |
| 31 | + |
| 32 | +- Multiple PostgreSQL containers running simultaneously |
| 33 | +- Multiple bcrypt hash operations competing for CPU |
| 34 | +- Multiple Better Auth instances with their own connection pools |
| 35 | + |
| 36 | +The concurrent sign-in test amplifies this by making 10 simultaneous requests, each requiring bcrypt comparison. Under load, this pushes the total execution past the 5s timeout. |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## Possible Fixes |
| 41 | + |
| 42 | +### Option A: Run integration tests sequentially |
| 43 | + |
| 44 | +Set `fileParallelism: false` for integration tests so only one test file runs at a time. This eliminates resource contention at the cost of slower total test execution. |
| 45 | + |
| 46 | +```typescript |
| 47 | +// vitest.config.ts |
| 48 | +export default defineConfig({ |
| 49 | + test: { |
| 50 | + // ...existing config... |
| 51 | + fileParallelism: false, |
| 52 | + }, |
| 53 | +}); |
| 54 | +``` |
| 55 | + |
| 56 | +Alternatively, use a Vitest workspace or project configuration to only run integration tests sequentially while keeping unit tests parallel. |
| 57 | + |
| 58 | +### Option B: Increase timeouts for resource-intensive tests |
| 59 | + |
| 60 | +Add explicit timeouts to the affected tests: |
| 61 | + |
| 62 | +```typescript |
| 63 | +it('should handle concurrent sign-in attempts', async () => { |
| 64 | + // ... |
| 65 | +}, 15000); |
| 66 | + |
| 67 | +it('should set session cookie on successful sign-in', async () => { |
| 68 | + // ... |
| 69 | +}, 10000); |
| 70 | +``` |
| 71 | + |
| 72 | +This is a band-aid — it papers over the contention rather than fixing it, and the thresholds are fragile (CI runners may need different values). |
| 73 | + |
| 74 | +### Option C: Reduce concurrency in the test itself |
| 75 | + |
| 76 | +Lower the concurrent sign-in count from 10 to 3–5, reducing the bcrypt workload: |
| 77 | + |
| 78 | +```typescript |
| 79 | +const requests = Array.from({ length: 3 }, () => |
| 80 | + app.fetch(/* ... */), |
| 81 | +); |
| 82 | +``` |
| 83 | + |
| 84 | +### Option D: Share a single Testcontainers instance |
| 85 | + |
| 86 | +Refactor the test helpers to use a shared PostgreSQL container across all integration test suites (with schema isolation per suite). This would dramatically reduce container overhead and is the most thorough fix, but requires significant refactoring of the test setup. |
| 87 | + |
| 88 | +--- |
| 89 | + |
| 90 | +## Affected Files |
| 91 | + |
| 92 | +| File | Description | |
| 93 | +|------|-------------| |
| 94 | +| `apps/backend/tests/integration/auth-edge-cases.test.ts` | Contains the two flaky tests (lines 460, 524) | |
| 95 | +| `apps/backend/vitest.config.ts` | No parallelism or sequence configuration | |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## Acceptance Criteria |
| 100 | + |
| 101 | +- [ ] The full integration test suite (`pnpm --filter backend test`) passes reliably on repeated runs |
| 102 | +- [ ] The pre-push hook does not fail due to test timeouts under normal conditions |
| 103 | +- [ ] The chosen fix does not mask real test failures (i.e., don't just set timeouts to 60s) |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +## Out of Scope |
| 108 | + |
| 109 | +- Rewriting the integration test infrastructure |
| 110 | +- Changing Testcontainers configuration |
| 111 | +- CI pipeline changes (this is about local pre-push reliability) |
| 112 | + |
| 113 | +--- |
| 114 | + |
| 115 | +## Dependencies |
| 116 | + |
| 117 | +- None — this is a standalone test reliability improvement |
0 commit comments