Commit 92a12cd
committed
fstests_watchdog: add process-based fallback for test detection
When testing parallel writeback patches for over 5 days with generic/750
I noticed the following:
./scripts/workflows/fstests/fstests_watchdog.py hosts baseline
Hostname Test-name Completion % runtime(s) last-runtime(s) Stall-status Kernel Crash-status
pw2-xfs-reflink-4k generic/750 0% (soak) 75290 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-8k-4ks generic/750 0% (soak) 75291 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-16k-4ks generic/750 0% (soak) 75290 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-32k-4ks generic/750 0% (soak) 75292 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-64k-4ks None 0% 0 0 OK 6.16.0-gbc97f3a7cc8f OK
Journal-method Soak-duration(s)
systemd-journal-remote 432000
But when I ssh to pw2-xfs-reflink-64k-4ks I can see generic/750 is
running. The issue is the test has been running so long we don't see
the kernel line any more about the test running.
When systemd journal and dmesg logs have rotated out test information
(which happens on long-running VMs), fall back to checking running
processes to detect which test is currently executing.
The fallback:
1. Uses SSH to check for 'check -s' processes on the host
2. Extracts the test name from the command line (last argument)
3. Gets the process runtime using 'ps -o etimes' to calculate duration
This ensures the watchdog can correctly identify running tests even
when all log messages have rotated out, preventing false "None" test
reports for actively running tests.
With this, I can no see what I expect:
./scripts/workflows/fstests/fstests_watchdog.py hosts baseline
Hostname Test-name Completion % runtime(s) last-runtime(s) Stall-status Kernel Crash-status
pw2-xfs-reflink-4k generic/750 0% (soak) 76119 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-8k-4ks generic/750 0% (soak) 76119 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-16k-4ks generic/750 0% (soak) 76119 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-32k-4ks generic/750 0% (soak) 76120 0 OK 6.16.0-gbc97f3a7cc8f OK
pw2-xfs-reflink-64k-4ks generic/750 0% (soak) 76128 0 OK 6.16.0-gbc97f3a7cc8f OK
Journal-method Soak-duration(s)
systemd-journal-remote 432000
Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <[email protected]>1 parent de18d27 commit 92a12cd
1 file changed
+73
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
56 | 129 | | |
57 | 130 | | |
58 | 131 | | |
| |||
0 commit comments