Skip to content

Commit f6e9288

Browse files
committed
Add E2E Flaky Test Diagnosis Guide
1 parent 49bbad9 commit f6e9288

File tree

1 file changed

+90
-0
lines changed

1 file changed

+90
-0
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Finding the Root Cause of E2E Test Flakes
2+
3+
## Introduction
4+
When end-to-end tests fail intermittently, the underlying issues can stem from resource constraints, environment setup, or test design—among other possibilities. This guide helps developers and QA systematically diagnose and address E2E test flakiness, reducing the time spent on guesswork and repeated failures.
5+
6+
---
7+
8+
## 1. GitHub Runners' Hardware
9+
GitHub provides **hosted runners** with specific CPU, memory, and disk allocations. If your tests require more resources than these runners can provide, you may encounter intermittent failures.
10+
11+
By default, we run tests on **`ubuntu-latest`**, as it is **free for public repositories** and the **most cost-effective option for private repositories**. However, this runner has limited resources, which can lead to intermittent failures in resource-intensive tests.
12+
13+
### 1.1 Available GitHub Runners
14+
Below are the some of the GitHub-hosted runners available in our organization:
15+
16+
| Runner Name | CPU | Memory | Disk |
17+
|------------|-----|--------|------|
18+
| `ubuntu-22.04-4cores-16GB` | 4 cores | 16 GB RAM | 150 GB SSD |
19+
| `ubuntu-latest-4cores-16GB` | 4 cores | 16 GB RAM | 150 GB SSD |
20+
| `ubuntu-22.04-8cores-32GB` | 8 cores | 32 GB RAM | 300 GB SSD |
21+
| `ubuntu-latest-8cores-32GB` | 8 cores | 32 GB RAM | 300 GB SSD |
22+
| `ubuntu-22.04-8cores-32GB-ARM` | 8 cores | 32 GB RAM | 300 GB SSD |
23+
24+
25+
### 1.2 Tips for Low-Resource Environments
26+
- **Profile your tests** to understand their CPU and memory usage.
27+
- **Optimize**: Only spin up what you need.
28+
- **If resources are insufficient**, consider redesigning your tests to run in smaller, independent chunks.
29+
- **If needed**, you can configure CI workflows to use a higher-tier runner, but this comes at an additional cost.
30+
31+
---
32+
33+
## 2. Reproducing Flakes
34+
Flaky tests don't fail on every run, so you need to execute them multiple times to isolate problems.
35+
36+
### 2.1 Repeat Runs
37+
For E2E tests, run them 5–10 times consecutively to expose intermittent issues.
38+
39+
### 2.2 Flaky Unit Tests in the Core Repository
40+
For unit tests in the core repository, you can use a dedicated command to detect flakiness in an updated test:
41+
42+
43+
```sh
44+
cd chainlink-core/
45+
make run_flakeguard_validate_tests
46+
```
47+
48+
49+
## 3. Testing Locally Under CPU and Memory Constraints
50+
51+
If CPU throttling or resource contention is suspected, here's how you can approach testing under constrained resources:
52+
53+
1. **Spin up Docker containers locally with limited CPU or memory.**
54+
2. **Mimic GitHub's environment** (use the same OS, similar resource limits).
55+
3. **Run E2E tests** repeatedly to see if flakiness correlates with resource usage.
56+
4. **Review logs and metrics** for signs of CPU or memory starvation.
57+
58+
59+
### Setting Global Limits (Docker Desktop)
60+
If you are using **Docker Desktop** on **macOS or Windows**, you can globally limit Docker's resource usage:
61+
62+
1. Open **Docker Desktop**.
63+
2. Navigate to **Settings****Resources**.
64+
3. Adjust the sliders for **CPUs** and **Memory**.
65+
4. Click **Apply & Restart** to enforce the new limits.
66+
67+
This setting caps the **total** resources Docker can use on your machine, ensuring all containers run within the specified constraints.
68+
69+
70+
### Observing Test Behavior Under Constraints
71+
- **Run your E2E tests repeatedly** with different global resource settings.
72+
- Watch for flakiness: If tests start failing more under tighter limits, suspect CPU throttling or memory starvation.
73+
- **Examine logs/metrics** to pinpoint if insufficient resources are causing sporadic failures.
74+
75+
By setting global limits, you can simulate resource-constrained environments similar to CI/CD pipelines and detect potential performance bottlenecks in your tests.
76+
77+
78+
## 4. Common Pitfalls and “Gotchas”
79+
1. **Resource Starvation**: Heavy tests on minimal hardware lead to timeouts or slow responses.
80+
2. **External Dependencies**: Network latency, rate limits, or third-party service issues can cause sporadic failures.
81+
3. **Shared State**: Race conditions arise if tests share databases or global variables in parallel runs.
82+
4. **Timeouts**: Overly tight time limits can fail tests on slower environments.
83+
84+
85+
## 5. Key Takeaways
86+
Tackle flakiness systematically:
87+
1. **Attempt local reproduction** (e.g., Docker + limited resources).
88+
2. **Run multiple iterations** on GitHub runners.
89+
3. **Analyze logs and metrics** to see if resource or concurrency issues exist.
90+
4. **Escalate** to the infra team only after confirming the issue isn't in your own test code or setup.

0 commit comments

Comments
 (0)