test: match OpenClaw argv in crash-loop PID probe#4049
Conversation
📝 WalkthroughWalkthroughUpdate to the E2E test's gateway_pid() fallback: expanded inline documentation and replacement of a ps/awk comm-based PID filter with ChangesProcess ID Detection Improvement
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Selective E2E Results — ❌ Some jobs failedRun: 26265650672
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh`:
- Around line 121-122: The fallback pgrep call (pid="$(pgrep -fo '[o]penclaw'
... )") can return the launcher PID; change it to explicitly exclude the
launcher process when selecting a plain openclaw PID — for example, use pgrep
-af '[o]penclaw' (or equivalent) and filter out commands/args that contain
"launcher" before extracting the PID so the selected PID corresponds to a
non-launcher gateway process; keep the readiness grep against /tmp/gateway.log
and assign the filtered PID to the pid variable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a2548f6f-7b52-4779-8db8-679bcf4d5838
📒 Files selected for processing (1)
test/e2e/test-issue-2478-crash-loop-recovery.sh
| if [ -z "$pid" ] && grep -Eq "\[gateway\] (ready|http server listening)" /tmp/gateway.log 2>/dev/null; then | ||
| pid="$(ps -eo pid=,comm=,args= 2>/dev/null | awk '$2 == "openclaw" { print $1 }' | sort -n | head -n 1)" | ||
| pid="$(pgrep -fo '[o]penclaw' 2>/dev/null || true)" |
There was a problem hiding this comment.
Don't select the oldest plain openclaw process here.
This fallback can return the launcher instead of the gateway: the file already assumes both can coexist, and the launcher is older. Because the readiness grep is against the whole gateway.log, that branch can also stay enabled after a later gateway crash and still hand back a stale non-gateway PID. That breaks the subsequent kill/respawn checks.
Suggested fix
-if [ -z "$pid" ] && grep -Eq "\[gateway\] (ready|http server listening)" /tmp/gateway.log 2>/dev/null; then
- pid="$(pgrep -fo '[o]penclaw' 2>/dev/null || true)"
+if [ -z "$pid" ] && grep -Eq "\[gateway\] (ready|http server listening)" /tmp/gateway.log 2>/dev/null; then
+ pid="$(pgrep -fn '[o]penclaw' 2>/dev/null || true)"
fiIf plain openclaw can match more than just launcher+gateway in this environment, I'd explicitly exclude the launcher instead of relying only on age.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ -z "$pid" ] && grep -Eq "\[gateway\] (ready|http server listening)" /tmp/gateway.log 2>/dev/null; then | |
| pid="$(ps -eo pid=,comm=,args= 2>/dev/null | awk '$2 == "openclaw" { print $1 }' | sort -n | head -n 1)" | |
| pid="$(pgrep -fo '[o]penclaw' 2>/dev/null || true)" | |
| if [ -z "$pid" ] && grep -Eq "\[gateway\] (ready|http server listening)" /tmp/gateway.log 2>/dev/null; then | |
| pid="$(pgrep -fn '[o]penclaw' 2>/dev/null || true)" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh` around lines 121 - 122, The
fallback pgrep call (pid="$(pgrep -fo '[o]penclaw' ... )") can return the
launcher PID; change it to explicitly exclude the launcher process when
selecting a plain openclaw PID — for example, use pgrep -af '[o]penclaw' (or
equivalent) and filter out commands/args that contain "launcher" before
extracting the PID so the selected PID corresponds to a non-launcher gateway
process; keep the readiness grep against /tmp/gateway.log and assign the
filtered PID to the pid variable.
|
✨ Related open issues: |
Summary
openclawin argv, not only process commgateway.logshowing ready/listening so unrelated OpenClaw processes do not satisfy the probeWhy
PR #4045 fixed explicit
openclaw gateway/openclaw-gatewaymatching and stale diagnostics, but the rerun still showed the gateway as a plainopenclawargv process whilepscomm can benodeon hosted runners. The log proves the gateway is healthy (gateway ready,Phase: Ready, healthy inference), so this remains a PID matcher false negative.Validation
bash -n test/e2e/test-issue-2478-crash-loop-recovery.shgit diff --check origin/main..HEADSummary by CodeRabbit