Skip to content

Commit 3e80ab3

Browse files
authored
Merge pull request qualcomm-linux#145 from smuppand/stress-tools
Add stress-ng and stressapptest validation scripts
2 parents 2f6e1aa + e417ab6 commit 3e80ab3

File tree

4 files changed

+1575
-0
lines changed

4 files changed

+1575
-0
lines changed
Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
Scheduler/Stress Validation — stress-ng Runner
2+
3+
This README explains how to use the stress-ng–based validation script (run.sh) we wrote to exercise CPU, memory, I/O, and scheduler paths on embedded Linux systems (Yocto, Debian/Ubuntu, RT & non-RT kernels, NUMA/non-NUMA). It also covers how to get stress-ng onto your target (cross-compile or sideload).
4+
5+
---
6+
7+
What this test does
8+
9+
Launches stress-ng stressors sized to the current machine (online CPUs, RAM, and free disk) so we don’t overcommit tiny embedded boards.
10+
11+
Affines worker threads to every online CPU to make scheduler regressions obvious.
12+
13+
Applies fail criteria (max latency, OOM, I/O errors, stressor non-zero exits); returns non-zero exit code on failure for CI.
14+
15+
Saves a short summary and optional detailed logs; runs a dmesg scan via your functestlib.sh.
16+
17+
---
18+
19+
Requirements
20+
21+
stress-ng binary on the target
22+
23+
Standard tools: awk, grep, sed, cut, tr, sleep, date, head, getconf
24+
25+
(Optional) taskset, numactl for CPU pinning/NUMA; dd for I/O prechecks
26+
27+
Your test framework’s init_env and functestlib.sh (already handled by run.sh)
28+
29+
The runner reuses helpers from your existing functestlib.sh:
30+
31+
check_dependencies
32+
33+
find_test_case_by_name
34+
35+
log_info, log_warn, log_pass, log_fail, log_skip, log_error
36+
37+
scan_dmesg_errors
38+
39+
---
40+
41+
Getting stress-ng
42+
43+
Project: https://github.com/ColinIanKing/stress-ng
44+
45+
A) Native install (Debian/Ubuntu)
46+
47+
sudo apt-get update
48+
sudo apt-get install -y stress-ng
49+
50+
B) Cross-compile (Yocto)
51+
52+
Add to your image or build it as an SDK tool:
53+
54+
In your layer, ensure stress-ng is available (meta-openembedded has a recipe in meta-oe on many branches).
55+
56+
Add to image:
57+
58+
IMAGE_INSTALL:append = " stress-ng"
59+
60+
Rebuild image / SDK:
61+
62+
bitbake core-image-minimal
63+
64+
C) Cross-compile (generic cmake/make)
65+
66+
On your host:
67+
68+
git clone https://github.com/ColinIanKing/stress-ng.git
69+
cd stress-ng
70+
make CROSS_COMPILE=aarch64-linux-gnu- # or your triplet
71+
# artifact is src/stress-ng
72+
73+
Copy the binary to your target (see “Sideload” below).
74+
75+
D) Android / BusyBox targets (sideload)
76+
77+
Push a statically linked stress-ng:
78+
79+
adb push stress-ng /usr/local/bin/
80+
adb shell chmod 755 /usr/local/bin/stress-ng
81+
82+
Or with SSH:
83+
84+
scp stress-ng root@TARGET:/usr/local/bin/
85+
ssh root@TARGET chmod 755 /usr/local/bin/stress-ng
86+
87+
---
88+
89+
run.sh quick start
90+
91+
From the test case directory (the script finds its own path via find_test_case_by_name):
92+
93+
./run.sh
94+
95+
By default, it:
96+
97+
Detects online CPUs, total RAM, and free disk.
98+
99+
Picks safe defaults: worker threads == online CPUs, memory workers sized to a small percentage of RAM, I/O workers sized to free space.
100+
101+
Runs for a sane duration (e.g., 5–10 minutes configurable).
102+
103+
Fails on stressor non-zero exit, OOM, major I/O error, or dmesg anomalies.
104+
105+
## Usage
106+
107+
```
108+
Usage: ./run.sh [--p1 <sec>] [--p2 <sec>] [--mem-frac <pct>] [--disk-frac <pct>]
109+
[--cpu-list <list>] [--temp-limit <degC>] [--stressng "<args>"]
110+
[--repeat <N>] [--help]
111+
```
112+
113+
### Options
114+
115+
| Option | Description |
116+
|---------------------|-------------|
117+
| `--p1 <sec>` | Phase 1 duration in seconds (default: 60) |
118+
| `--p2 <sec>` | Phase 2 duration in seconds (default: 60) |
119+
| `--mem-frac <pct>` | Percentage of total memory per worker (default: 15) |
120+
| `--disk-frac <pct>` | Percentage of free disk space per worker (default: 5) |
121+
| `--cpu-list <list>` | Comma-separated list or range of CPUs to stress |
122+
| `--temp-limit <degC>` | Maximum temperature threshold |
123+
| `--stressng "<args>"` | Additional arguments passed to stress-ng |
124+
| `--repeat <N>` | Repeat the entire test sequence N times (default: 1) |
125+
| `--help` | Show this help message and exit |
126+
127+
> Exact flags may differ slightly depending on your final script; the examples below assume the version we discussed (auto-sizing, affinity, fail criteria, reuse of functestlib.sh).
128+
129+
---
130+
131+
Example invocations
132+
133+
1) Quick CPU & memory smoke (auto sizing, 5 min)
134+
135+
./run.sh --duration 300 --stressors cpu,vm
136+
137+
2) Full platform shake (CPU+VM+I/O; pinned per-CPU)
138+
139+
./run.sh --duration 600 --stressors cpu,vm,io --logs
140+
141+
3) Limit footprint on small RAM systems
142+
143+
./run.sh --duration 180 --stressors vm --mem-pct 5
144+
145+
4) Pin workers to a subset of CPUs
146+
147+
./run.sh --cpu-list 0-3 --duration 240 --stressors cpu
148+
149+
5) Exercise only I/O with conservative disk usage
150+
151+
./run.sh --stressors io --disk-pct 3 --duration 120
152+
153+
6) Mixed with latency guardrail (if cyclic path is enabled)
154+
155+
./run.sh --stressors cpu,vm --max-latency-us 500 --duration 300
156+
157+
7) Run with default phases, repeated 3 times
158+
159+
./run.sh --repeat 3
160+
161+
8) Run on specific CPUs with temperature limit
162+
163+
./run.sh --cpu-list 0-3 --temp-limit 80
164+
165+
9) Run memory-intensive workload for 90 seconds per phase
166+
167+
./run.sh --mem-frac 30 --p1 90 --p2 90
168+
169+
10) Run stress-ng with a custom workload twice
170+
171+
./run.sh --repeat 2 --stressng "--cpu 4 --timeout 30 --verify"
172+
173+
---
174+
175+
What the script checks/fails on
176+
177+
stressor exit codes (any non-zero → FAIL)
178+
179+
Killed by OOM or ENOMEM patterns in stress-ng output → FAIL
180+
181+
I/O failures (EIO, read/write errors) → FAIL
182+
183+
dmesg anomalies via scan_dmesg_errors → WARN/FAIL as configured
184+
185+
(Optional) latency threshold if you also run a small cyclic step
186+
187+
Exit code:
188+
189+
0 = PASS (no failures, at least one stressor ran)
190+
191+
1 = FAIL (functional failure or threshold exceeded)
192+
193+
2 = SKIP (dependencies missing)
194+
195+
Artifacts:
196+
197+
stress-ng-summary.log (always)
198+
199+
stress-ng-*.log files (with --logs)
200+
201+
*.res result file for your harness
202+
203+
---
204+
205+
Sizing & affinity logic (how it stays safe)
206+
207+
CPU workers: ≤ online CPUs (default: one worker per online CPU)
208+
209+
Memory workers: uses a small percentage of total RAM (cap per worker), adjustable via --mem-pct
210+
211+
I/O workers: uses a small percentage of free disk (cap per worker), adjustable via --disk-pct
212+
213+
Affinity: default is on (each worker pinned to a specific online CPU); disable with --no-affine
214+
215+
NUMA: if numactl exists, the script prefers local node binding where appropriate; otherwise it simply CPU-affines.
216+
217+
---
218+
219+
Building stress-ng into your products
220+
221+
Yocto (image integration)
222+
223+
Add to your image recipe or local.conf:
224+
225+
IMAGE_INSTALL:append = " stress-ng"
226+
227+
Rebuild and flash your image.
228+
229+
Debian/Ubuntu rootfs
230+
231+
Bake into your rootfs recipe or install at first boot with a provisioning script:
232+
233+
apt-get update && apt-get install -y stress-ng
234+
235+
Sideload in CI
236+
237+
For CI smoke on development hardware:
238+
239+
scp stress-ng root@TARGET:/usr/local/bin/
240+
ssh root@TARGET chmod 755 /usr/local/bin/stress-ng
241+
242+
---
243+
244+
Cross-compiling notes & tips
245+
246+
On ARM64 build hosts with Linaro/GCC toolchains:
247+
248+
make CROSS_COMPILE=aarch64-linux-gnu-
249+
file src/stress-ng # confirm aarch64 ELF
250+
251+
Prefer static if your target is minimal:
252+
253+
make static
254+
255+
Validate dependencies: run src/stress-ng --version on the host and then on the target after copy.
256+
257+
---
258+
259+
Troubleshooting
260+
261+
“stress-ng: command not found”
262+
Not on PATH. Install natively, or place it in /usr/local/bin and chmod +x.
263+
264+
Out-of-memory or system lockups
265+
Lower --mem-pct, shorten --duration, drop io on small/flash media.
266+
267+
I/O errors / read-only filesystems
268+
Switch to a writable mount (e.g., /tmp) or adjust --disk-pct down to 1–2%.
269+
270+
High kernel latency on PREEMPT_RT
271+
Start with CPU-only tests, then introduce memory/I/O slowly; use --max-latency-us to gate.
272+
273+
BusyBox environments
274+
Ensure the script’s dependencies exist (the runner checks and SKIPs otherwise). You can pre-install missed tools or adjust the stress mix.
275+
276+
---
277+
278+
Security & safety
279+
280+
This script is destructive only in its I/O scratch area (e.g., under /tmp/stress-ng-io); it won’t touch other files.
281+
282+
It will refuse to over-allocate RAM/disk beyond configured caps.
283+
284+
Still, run on development hardware or staging boards when possible.
285+
286+
---
287+
288+
License
289+
290+
The test runner: BSD-3-Clause-Clear (Qualcomm Technologies, Inc. and/or its subsidiaries).
291+
292+
stress-ng is licensed upstream by its author; see its repository for details.
293+
294+
---
295+
296+
Appendix: Useful stress-ng commands (manual)
297+
298+
See available stressors:
299+
300+
stress-ng --class cpu --sequential 1 --metrics-brief --timeout 10
301+
302+
Run with maximum stress on all classes (dangerous on small boards):
303+
304+
stress-ng --aggressive --all 1 --timeout 60
305+
306+
Only memory:
307+
308+
stress-ng --vm 4 --vm-bytes 5% --vm-keep --timeout 120

0 commit comments

Comments
 (0)