|
| 1 | +Scheduler/Stress Validation — stress-ng Runner |
| 2 | + |
| 3 | +This README explains how to use the stress-ng–based validation script (run.sh) we wrote to exercise CPU, memory, I/O, and scheduler paths on embedded Linux systems (Yocto, Debian/Ubuntu, RT & non-RT kernels, NUMA/non-NUMA). It also covers how to get stress-ng onto your target (cross-compile or sideload). |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +What this test does |
| 8 | + |
| 9 | +Launches stress-ng stressors sized to the current machine (online CPUs, RAM, and free disk) so we don’t overcommit tiny embedded boards. |
| 10 | + |
| 11 | +Affines worker threads to every online CPU to make scheduler regressions obvious. |
| 12 | + |
| 13 | +Applies fail criteria (max latency, OOM, I/O errors, stressor non-zero exits); returns non-zero exit code on failure for CI. |
| 14 | + |
| 15 | +Saves a short summary and optional detailed logs; runs a dmesg scan via your functestlib.sh. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +Requirements |
| 20 | + |
| 21 | +stress-ng binary on the target |
| 22 | + |
| 23 | +Standard tools: awk, grep, sed, cut, tr, sleep, date, head, getconf |
| 24 | + |
| 25 | +(Optional) taskset, numactl for CPU pinning/NUMA; dd for I/O prechecks |
| 26 | + |
| 27 | +Your test framework’s init_env and functestlib.sh (already handled by run.sh) |
| 28 | + |
| 29 | +The runner reuses helpers from your existing functestlib.sh: |
| 30 | + |
| 31 | +check_dependencies |
| 32 | + |
| 33 | +find_test_case_by_name |
| 34 | + |
| 35 | +log_info, log_warn, log_pass, log_fail, log_skip, log_error |
| 36 | + |
| 37 | +scan_dmesg_errors |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +Getting stress-ng |
| 42 | + |
| 43 | +Project: https://github.com/ColinIanKing/stress-ng |
| 44 | + |
| 45 | +A) Native install (Debian/Ubuntu) |
| 46 | + |
| 47 | +sudo apt-get update |
| 48 | +sudo apt-get install -y stress-ng |
| 49 | + |
| 50 | +B) Cross-compile (Yocto) |
| 51 | + |
| 52 | +Add to your image or build it as an SDK tool: |
| 53 | + |
| 54 | +In your layer, ensure stress-ng is available (meta-openembedded has a recipe in meta-oe on many branches). |
| 55 | + |
| 56 | +Add to image: |
| 57 | + |
| 58 | +IMAGE_INSTALL:append = " stress-ng" |
| 59 | + |
| 60 | +Rebuild image / SDK: |
| 61 | + |
| 62 | +bitbake core-image-minimal |
| 63 | + |
| 64 | +C) Cross-compile (generic cmake/make) |
| 65 | + |
| 66 | +On your host: |
| 67 | + |
| 68 | +git clone https://github.com/ColinIanKing/stress-ng.git |
| 69 | +cd stress-ng |
| 70 | +make CROSS_COMPILE=aarch64-linux-gnu- # or your triplet |
| 71 | +# artifact is src/stress-ng |
| 72 | + |
| 73 | +Copy the binary to your target (see “Sideload” below). |
| 74 | + |
| 75 | +D) Android / BusyBox targets (sideload) |
| 76 | + |
| 77 | +Push a statically linked stress-ng: |
| 78 | + |
| 79 | +adb push stress-ng /usr/local/bin/ |
| 80 | +adb shell chmod 755 /usr/local/bin/stress-ng |
| 81 | + |
| 82 | +Or with SSH: |
| 83 | + |
| 84 | +scp stress-ng root@TARGET:/usr/local/bin/ |
| 85 | +ssh root@TARGET chmod 755 /usr/local/bin/stress-ng |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +run.sh quick start |
| 90 | + |
| 91 | +From the test case directory (the script finds its own path via find_test_case_by_name): |
| 92 | + |
| 93 | +./run.sh |
| 94 | + |
| 95 | +By default, it: |
| 96 | + |
| 97 | +Detects online CPUs, total RAM, and free disk. |
| 98 | + |
| 99 | +Picks safe defaults: worker threads == online CPUs, memory workers sized to a small percentage of RAM, I/O workers sized to free space. |
| 100 | + |
| 101 | +Runs for a sane duration (e.g., 5–10 minutes configurable). |
| 102 | + |
| 103 | +Fails on stressor non-zero exit, OOM, major I/O error, or dmesg anomalies. |
| 104 | + |
| 105 | +## Usage |
| 106 | + |
| 107 | +``` |
| 108 | +Usage: ./run.sh [--p1 <sec>] [--p2 <sec>] [--mem-frac <pct>] [--disk-frac <pct>] |
| 109 | + [--cpu-list <list>] [--temp-limit <degC>] [--stressng "<args>"] |
| 110 | + [--repeat <N>] [--help] |
| 111 | +``` |
| 112 | + |
| 113 | +### Options |
| 114 | + |
| 115 | +| Option | Description | |
| 116 | +|---------------------|-------------| |
| 117 | +| `--p1 <sec>` | Phase 1 duration in seconds (default: 60) | |
| 118 | +| `--p2 <sec>` | Phase 2 duration in seconds (default: 60) | |
| 119 | +| `--mem-frac <pct>` | Percentage of total memory per worker (default: 15) | |
| 120 | +| `--disk-frac <pct>` | Percentage of free disk space per worker (default: 5) | |
| 121 | +| `--cpu-list <list>` | Comma-separated list or range of CPUs to stress | |
| 122 | +| `--temp-limit <degC>` | Maximum temperature threshold | |
| 123 | +| `--stressng "<args>"` | Additional arguments passed to stress-ng | |
| 124 | +| `--repeat <N>` | Repeat the entire test sequence N times (default: 1) | |
| 125 | +| `--help` | Show this help message and exit | |
| 126 | + |
| 127 | +> Exact flags may differ slightly depending on your final script; the examples below assume the version we discussed (auto-sizing, affinity, fail criteria, reuse of functestlib.sh). |
| 128 | +
|
| 129 | +--- |
| 130 | + |
| 131 | +Example invocations |
| 132 | + |
| 133 | +1) Quick CPU & memory smoke (auto sizing, 5 min) |
| 134 | + |
| 135 | +./run.sh --duration 300 --stressors cpu,vm |
| 136 | + |
| 137 | +2) Full platform shake (CPU+VM+I/O; pinned per-CPU) |
| 138 | + |
| 139 | +./run.sh --duration 600 --stressors cpu,vm,io --logs |
| 140 | + |
| 141 | +3) Limit footprint on small RAM systems |
| 142 | + |
| 143 | +./run.sh --duration 180 --stressors vm --mem-pct 5 |
| 144 | + |
| 145 | +4) Pin workers to a subset of CPUs |
| 146 | + |
| 147 | +./run.sh --cpu-list 0-3 --duration 240 --stressors cpu |
| 148 | + |
| 149 | +5) Exercise only I/O with conservative disk usage |
| 150 | + |
| 151 | +./run.sh --stressors io --disk-pct 3 --duration 120 |
| 152 | + |
| 153 | +6) Mixed with latency guardrail (if cyclic path is enabled) |
| 154 | + |
| 155 | +./run.sh --stressors cpu,vm --max-latency-us 500 --duration 300 |
| 156 | + |
| 157 | +7) Run with default phases, repeated 3 times |
| 158 | + |
| 159 | +./run.sh --repeat 3 |
| 160 | + |
| 161 | +8) Run on specific CPUs with temperature limit |
| 162 | + |
| 163 | +./run.sh --cpu-list 0-3 --temp-limit 80 |
| 164 | + |
| 165 | +9) Run memory-intensive workload for 90 seconds per phase |
| 166 | + |
| 167 | +./run.sh --mem-frac 30 --p1 90 --p2 90 |
| 168 | + |
| 169 | +10) Run stress-ng with a custom workload twice |
| 170 | + |
| 171 | +./run.sh --repeat 2 --stressng "--cpu 4 --timeout 30 --verify" |
| 172 | + |
| 173 | +--- |
| 174 | + |
| 175 | +What the script checks/fails on |
| 176 | + |
| 177 | +stressor exit codes (any non-zero → FAIL) |
| 178 | + |
| 179 | +Killed by OOM or ENOMEM patterns in stress-ng output → FAIL |
| 180 | + |
| 181 | +I/O failures (EIO, read/write errors) → FAIL |
| 182 | + |
| 183 | +dmesg anomalies via scan_dmesg_errors → WARN/FAIL as configured |
| 184 | + |
| 185 | +(Optional) latency threshold if you also run a small cyclic step |
| 186 | + |
| 187 | +Exit code: |
| 188 | + |
| 189 | +0 = PASS (no failures, at least one stressor ran) |
| 190 | + |
| 191 | +1 = FAIL (functional failure or threshold exceeded) |
| 192 | + |
| 193 | +2 = SKIP (dependencies missing) |
| 194 | + |
| 195 | +Artifacts: |
| 196 | + |
| 197 | +stress-ng-summary.log (always) |
| 198 | + |
| 199 | +stress-ng-*.log files (with --logs) |
| 200 | + |
| 201 | +*.res result file for your harness |
| 202 | + |
| 203 | +--- |
| 204 | + |
| 205 | +Sizing & affinity logic (how it stays safe) |
| 206 | + |
| 207 | +CPU workers: ≤ online CPUs (default: one worker per online CPU) |
| 208 | + |
| 209 | +Memory workers: uses a small percentage of total RAM (cap per worker), adjustable via --mem-pct |
| 210 | + |
| 211 | +I/O workers: uses a small percentage of free disk (cap per worker), adjustable via --disk-pct |
| 212 | + |
| 213 | +Affinity: default is on (each worker pinned to a specific online CPU); disable with --no-affine |
| 214 | + |
| 215 | +NUMA: if numactl exists, the script prefers local node binding where appropriate; otherwise it simply CPU-affines. |
| 216 | + |
| 217 | +--- |
| 218 | + |
| 219 | +Building stress-ng into your products |
| 220 | + |
| 221 | +Yocto (image integration) |
| 222 | + |
| 223 | +Add to your image recipe or local.conf: |
| 224 | + |
| 225 | +IMAGE_INSTALL:append = " stress-ng" |
| 226 | + |
| 227 | +Rebuild and flash your image. |
| 228 | + |
| 229 | +Debian/Ubuntu rootfs |
| 230 | + |
| 231 | +Bake into your rootfs recipe or install at first boot with a provisioning script: |
| 232 | + |
| 233 | +apt-get update && apt-get install -y stress-ng |
| 234 | + |
| 235 | +Sideload in CI |
| 236 | + |
| 237 | +For CI smoke on development hardware: |
| 238 | + |
| 239 | +scp stress-ng root@TARGET:/usr/local/bin/ |
| 240 | +ssh root@TARGET chmod 755 /usr/local/bin/stress-ng |
| 241 | + |
| 242 | +--- |
| 243 | + |
| 244 | +Cross-compiling notes & tips |
| 245 | + |
| 246 | +On ARM64 build hosts with Linaro/GCC toolchains: |
| 247 | + |
| 248 | +make CROSS_COMPILE=aarch64-linux-gnu- |
| 249 | +file src/stress-ng # confirm aarch64 ELF |
| 250 | + |
| 251 | +Prefer static if your target is minimal: |
| 252 | + |
| 253 | +make static |
| 254 | + |
| 255 | +Validate dependencies: run src/stress-ng --version on the host and then on the target after copy. |
| 256 | + |
| 257 | +--- |
| 258 | + |
| 259 | +Troubleshooting |
| 260 | + |
| 261 | +“stress-ng: command not found” |
| 262 | +Not on PATH. Install natively, or place it in /usr/local/bin and chmod +x. |
| 263 | + |
| 264 | +Out-of-memory or system lockups |
| 265 | +Lower --mem-pct, shorten --duration, drop io on small/flash media. |
| 266 | + |
| 267 | +I/O errors / read-only filesystems |
| 268 | +Switch to a writable mount (e.g., /tmp) or adjust --disk-pct down to 1–2%. |
| 269 | + |
| 270 | +High kernel latency on PREEMPT_RT |
| 271 | +Start with CPU-only tests, then introduce memory/I/O slowly; use --max-latency-us to gate. |
| 272 | + |
| 273 | +BusyBox environments |
| 274 | +Ensure the script’s dependencies exist (the runner checks and SKIPs otherwise). You can pre-install missed tools or adjust the stress mix. |
| 275 | + |
| 276 | +--- |
| 277 | + |
| 278 | +Security & safety |
| 279 | + |
| 280 | +This script is destructive only in its I/O scratch area (e.g., under /tmp/stress-ng-io); it won’t touch other files. |
| 281 | + |
| 282 | +It will refuse to over-allocate RAM/disk beyond configured caps. |
| 283 | + |
| 284 | +Still, run on development hardware or staging boards when possible. |
| 285 | + |
| 286 | +--- |
| 287 | + |
| 288 | +License |
| 289 | + |
| 290 | +The test runner: BSD-3-Clause-Clear (Qualcomm Technologies, Inc. and/or its subsidiaries). |
| 291 | + |
| 292 | +stress-ng is licensed upstream by its author; see its repository for details. |
| 293 | + |
| 294 | +--- |
| 295 | + |
| 296 | +Appendix: Useful stress-ng commands (manual) |
| 297 | + |
| 298 | +See available stressors: |
| 299 | + |
| 300 | +stress-ng --class cpu --sequential 1 --metrics-brief --timeout 10 |
| 301 | + |
| 302 | +Run with maximum stress on all classes (dangerous on small boards): |
| 303 | + |
| 304 | +stress-ng --aggressive --all 1 --timeout 60 |
| 305 | + |
| 306 | +Only memory: |
| 307 | + |
| 308 | +stress-ng --vm 4 --vm-bytes 5% --vm-keep --timeout 120 |
0 commit comments