fix(e2e): handle top-level payloads in openclaw agent JSON output by hunglp6d · Pull Request #4030 · NVIDIA/NemoClaw

hunglp6d · 2026-05-22T01:10:15Z

Summary

Five nightly E2E tests fail because their inline Python extractors assume openclaw agent --json output nests payloads under result.payloads, but OpenClaw ≥ 2026.5.18 now returns payloads at the top level of the JSON envelope. The fix makes the extraction resilient to both schemas: doc.get("result") or doc falls back to the document root when there is no result wrapper.

Root cause

OpenClaw's agent JSON output changed from:

{"result": {"payloads": [{"text": "..."}]}}

to:

{"payloads": [{"text": "..."}]}

The E2E test scripts hardcoded doc.get("result") or {} which yields {} under the new schema, so .get("payloads") always returns [] → empty reply → assertion failure.

Changes

File	Line	Change
`test/e2e/test-bedrock-runtime-compatible-anthropic.sh`	713	`(doc.get("result") or {}).get("payloads")` → `(doc.get("result") or doc).get("payloads")`
`test/e2e/test-launchable-smoke.sh`	531	`doc.get('result') or {}` → `doc.get('result') or doc`
`test/e2e/test-messaging-compatible-endpoint.sh`	534	`doc.get('result') or {}` → `doc.get('result') or doc`
`test/e2e/test-openclaw-inference-switch.sh`	284	`doc.get("result") or {}` → `doc.get("result") or doc`
`test/e2e/test-sandbox-operations.sh`	393	`doc.get('result') or {}` → `doc.get('result') or doc`

Nightly run

Run: https://github.com/NVIDIA/NemoClaw/actions/runs/26260886472
Failed jobs (5 of 9, this PR): sandbox-operations-e2e, bedrock-runtime-compatible-anthropic-e2e, messaging-compatible-endpoint-e2e, openclaw-inference-switch-e2e, launchable-smoke-e2e
Remaining 4 failures (unrelated, tracked separately): RCF patch compat, gateway PID pattern, Kimi trajectory semicolon, Slack API 404

Validation

The extraction pattern doc.get("result") or doc is backwards-compatible:

Old schema ({"result": {"payloads": [...]}}): doc.get("result") returns truthy dict → uses it → .get("payloads") works
New schema ({"payloads": [...]}): doc.get("result") returns None → falls back to doc → .get("payloads") works

Custom E2E workflow validation was not run (PAT lacks workflow scope to push .github/workflows/ changes). The fix is a minimal, mechanically-verified single-expression change across 5 files.

Test plan

Verify nightly E2E passes after merge (all 5 affected jobs should go green)
Confirm the fix is backwards-compatible if OpenClaw reverts the schema change

Signed-off-by: Hung Le hple@nvidia.com

OpenClaw 2026.5.18 moved the `payloads` array from `result.payloads` to the top level of the `openclaw agent --json` output object. Five E2E tests extracted the reply via `doc.get("result") or {}` which now returns an empty dict, causing `result.get("payloads")` to yield no payloads and every agent-reply assertion to see an empty string. Change the fallback to `doc.get("result") or doc` so the extraction works with both the old nested schema and the new flat schema. Affected tests: - test-bedrock-runtime-compatible-anthropic.sh - test-messaging-compatible-endpoint.sh - test-openclaw-inference-switch.sh - test-launchable-smoke.sh - test-sandbox-operations.sh Signed-off-by: Hung Le <hple@nvidia.com>

copy-pr-bot · 2026-05-22T01:10:18Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-22T01:10:22Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c9347852-b29a-4905-a4bb-bf67cf53c2ff

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/nightly-e2e-agent-reply-json-extract-74c0246

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-22T01:11:24Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: bedrock-runtime-compatible-anthropic-e2e, launchable-smoke-e2e, messaging-compatible-endpoint-e2e, openclaw-inference-switch-e2e, sandbox-operations-e2e

Dispatch hint: bedrock-runtime-compatible-anthropic-e2e,launchable-smoke-e2e,messaging-compatible-endpoint-e2e,openclaw-inference-switch-e2e,sandbox-operations-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No merge-blocking E2E is required because this PR changes only existing E2E test harness assertions/parsers and cannot affect runtime user flows, sandbox behavior, credentials, security policy, or inference routing implementation.

Optional E2E

bedrock-runtime-compatible-anthropic-e2e (high): Optional validation of the modified Bedrock Runtime OpenClaw agent JSON parsing path under the existing hermetic Bedrock-compatible scenario.
launchable-smoke-e2e (high): Optional validation that the launchable install-flow smoke test still parses OpenClaw agent JSON correctly after the test-harness change.
messaging-compatible-endpoint-e2e (medium): Optional validation of the modified OpenClaw agent response parsing in the Telegram plus OpenAI-compatible endpoint E2E harness.
openclaw-inference-switch-e2e (high): Optional validation that the inference-switch E2E harness still extracts agent payload text correctly after switching routes.
sandbox-operations-e2e (high): Optional validation of the sandbox operations test's modified Connect & Chat JSON parsing assertion.

New E2E recommendations

None.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: bedrock-runtime-compatible-anthropic-e2e,launchable-smoke-e2e,messaging-compatible-endpoint-e2e,openclaw-inference-switch-e2e,sandbox-operations-e2e

github-actions · 2026-05-22T01:17:20Z

PR Review Advisor

Recommendation: blocked
Confidence: high
Analyzed HEAD: 6886ac3efcc16191922e1b27b08f40bbfb70ecd0
Findings: 1 blocker(s), 2 warning(s), 1 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: This advisory review is based on trusted metadata and the provided diff; no scripts, tests, package-manager commands, or workflows were executed.; Issue #4031 has no comments in the trusted context, so acceptance extraction is limited to the issue body.; The raw nightly failure logs referenced by issue #4031 were not independently inspected here; clauses about raw payload output are therefore marked unknown where appropriate.; The PR is draft and CodeRabbit skipped review, so there may be future review feedback not present in the current context.; Open PR #3925 overlaps all changed files and may alter the final merge conflict/drift assessment.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: 6886ac3efcc16191922e1b27b08f40bbfb70ecd0
Recommendation: blocked
Confidence: high

The five-line E2E parser compatibility change matches the linked issue, but the PR is still draft/mergeState BLOCKED and overlaps active work in PR #3925.

Gate status

CI: pass — Head SHA 6886ac3: required contexts checks, commit-lint, dco-check, check-hash, and changes completed successfully; no required failures reported.
Mergeability: fail — GitHub reports isDraft=true, reviewDecision=REVIEW_REQUIRED, and mergeStateStatus=BLOCKED for PR fix(e2e): handle top-level payloads in openclaw agent JSON output #4030.
Review threads: pass — GraphQL reviewThreads.nodes is empty. CodeRabbit did not produce review threads because review was skipped for draft state.
Risky code tested: warning — Risky areas detected by trusted context: credentials/inference/network. The patch changes only E2E test harness parsers, and the E2E Advisor marked live jobs optional rather than required, but the affected optional E2E jobs are not reported as passed for this head SHA.

🔴 Blockers

PR is not currently mergeable: The change appears mechanically scoped, but GitHub marks the PR as draft with mergeStateStatus=BLOCKED and reviewDecision=REVIEW_REQUIRED.
- Recommendation: Do not merge until the PR is ready for review, branch protection requirements are satisfied, and GitHub reports an unblocked merge state.
- Evidence: Trusted GitHub context: isDraft=true, reviewDecision=REVIEW_REQUIRED, mergeStateStatus=BLOCKED.

🟡 Warnings

Active open PR overlaps all changed files: Open PR chore: upgrade agent runtime dependencies #3925, titled "chore: upgrade agent runtime dependencies", touches the same five E2E scripts. This creates drift/rebase risk because the current parser fix may need to be preserved across dependency-upgrade changes.
- Recommendation: Before merge, compare with PR chore: upgrade agent runtime dependencies #3925 and ensure whichever branch lands second retains the top-level payloads parser fallback in all five scripts.
- Evidence: Trusted openPrOverlaps lists PR chore: upgrade agent runtime dependencies #3925 with sameFiles equal to all five changed files.
Affected live E2E jobs are optional and not shown as passed for this head SHA: The E2E Advisor says no merge-blocking E2E is required because only E2E harness parser code changed, but it recommends the five affected jobs as optional validation. The PR body also leaves nightly E2E validation unchecked.
- Recommendation: If maintainers want direct proof that the nightly failures are fixed before merge, run the optional affected E2E jobs for head SHA 6886ac3 or explicitly accept that static review is sufficient for this test-only patch.
- Evidence: E2E Advisor optional jobs: bedrock-runtime-compatible-anthropic-e2e, launchable-smoke-e2e, messaging-compatible-endpoint-e2e, openclaw-inference-switch-e2e, sandbox-operations-e2e. PR body test plan still has unchecked items to verify nightly E2E passes after merge.

🔵 Suggestions

Comment still describes only result.payloads (test/e2e/test-sandbox-operations.sh): The TC-SBX-02 comment says the assertion is on result.payloads[].text, but the parser now intentionally supports either result.payloads or top-level payloads.
- Recommendation: Update the nearby comment to describe both supported OpenClaw JSON envelope shapes so future maintainers do not regress the fallback.
- Evidence: Diff changes result = doc.get('result') or {} to result = doc.get('result') or doc, while the existing comment still references result.payloads[].text.

Acceptance coverage

met — ## Nightly E2E failure: agent reply JSON extraction broken by OpenClaw schema change: The diff changes only the agent reply JSON extraction in five E2E scripts, replacing result-only lookup with a fallback to the document root.
met — Affected jobs (5):: All five affected E2E scripts named by the issue are present in changedFiles.
met — sandbox-operations-e2e (job 77293833899): test/e2e/test-sandbox-operations.sh changes the inline parser from doc.get('result') or {} to doc.get('result') or doc.
met — bedrock-runtime-compatible-anthropic-e2e (job 77293833749): test/e2e/test-bedrock-runtime-compatible-anthropic.sh changes payload extraction from (doc.get("result") or {}).get("payloads") to (doc.get("result") or doc).get("payloads").
met — messaging-compatible-endpoint-e2e (job 77293833829): test/e2e/test-messaging-compatible-endpoint.sh changes the inline parser to fall back from missing result to doc.
met — openclaw-inference-switch-e2e (job 77293833863): test/e2e/test-openclaw-inference-switch.sh changes result = doc.get("result") or {} to result = doc.get("result") or doc.
met — launchable-smoke-e2e (job 77293833838): test/e2e/test-launchable-smoke.sh changes result = doc.get('result') or {} to result = doc.get('result') or doc.
met — Bug group: agent-reply-json-extract: Every changed hunk is in inline Python that extracts agent reply payload text from openclaw agent --json output.
partial — Failure class: config_error: The patch addresses test-harness schema handling rather than runtime inference code. No live rerun evidence for the failed nightly jobs is present for this head SHA.
met — Confidence: high: The change is mechanically identical across five parser sites and exactly matches the described schema difference.
met — OpenClaw ≥ 2026.5.18 changed its openclaw agent --json output from nesting payloads under {"result": {"payloads": [...]}} to returning them at the top level {"payloads": [...]}.: The new expression doc.get("result") or doc / doc.get('result') or doc supports both a nested result object and a top-level JSON envelope.
met — Five E2E test scripts use inline Python extractors that hardcode doc.get("result") or {}, which returns {} under the new schema — so .get("payloads") always yields [] → empty reply → assertion failure.: The diff removes this exact doc.get(...result...) or {} pattern in all five scripts and replaces it with a fallback to doc.
unknown — Raw agent output contains correct payload text (e.g., "42", "PONG") at top level: This is issue-provided failure-run evidence. The diff is consistent with it, but the current review context does not include the raw logs themselves.
met — Python extractor returns reply='' because it looks only in doc["result"]: The removed parser code used doc.get('result') or {} / doc.get("result") or {}, which produces no payloads when only top-level payloads exists.
unknown — Assertion fails on empty reply: The assertion paths still fail on empty reply, and the patch changes the parsing path that produces the reply, but this review did not execute the failing jobs.
met — PR: fix(e2e): handle top-level payloads in openclaw agent JSON output #4030: The reviewed PR is fix(e2e): handle top-level payloads in openclaw agent JSON output #4030 and its diff contains the described one-line changes.
met — The one-line fix in each script changes doc.get("result") or {} → doc.get("result") or doc, which falls back to the document root when the result wrapper is absent.: All five hunks implement the fallback-to-root expression, with equivalent single-quoted variants in three scripts.
met — Backwards-compatible with the old schema.: When doc.get('result') returns a truthy result object, the parser still reads result.get('payloads'); when it is absent, it reads doc.get('payloads').
met — test/e2e/test-bedrock-runtime-compatible-anthropic.sh (line 713): Changed at the OpenClaw agent reply parser: (doc.get("result") or doc).get("payloads").
met — test/e2e/test-launchable-smoke.sh (line 531): Changed at the agent reply parser: result = doc.get('result') or doc.

Security review

pass — Category 1: Secrets and Credentials: No new hardcoded secrets, tokens, credential files, or credential logging were introduced. Existing fake/test token strings in unchanged context are not modified by this PR.
pass — Category 2: Input Validation and Data Sanitization: The only changed input handling is test-harness JSON envelope parsing for trusted openclaw agent --json output. The fallback broadens accepted schema from nested result.payloads to top-level payloads; it does not affect production input validation, command construction, URL parsing, SSRF checks, or shell quoting.
pass — Category 3: Authentication and Authorization: No authentication or authorization logic is changed. The tests continue to exercise existing inference and sandbox paths without modifying auth enforcement.
pass — Category 4: Dependencies and Third-Party Libraries: No dependency manifests, installers, package versions, registries, or third-party library usage are changed.
pass — Category 5: Error Handling and Logging: No new logging or error disclosure paths are introduced. Existing parser failures remain contained to E2E assertion failures with stderr suppressed in the inline parser invocations.
pass — Category 6: Cryptography and Data Protection: Not applicable — no cryptographic operations, key handling, encryption settings, or data-protection behavior are changed.
pass — Category 7: Configuration and Security Headers: No runtime configuration, container image, security header, CORS, Dockerfile, network policy, or sandbox policy is changed.
pass — Category 8: Security Testing: The patch preserves existing E2E security-sensitive assertions, including provider/transport error checks and hop-header leak checks in the surrounding tests, while fixing their ability to parse the new OpenClaw JSON envelope. Live affected E2E jobs are optional rather than required per E2E Advisor.
pass — Category 9: Holistic Security Posture: No production sandbox, SSRF, credential, blueprint, installer, or workflow trusted-code boundary is changed. The change improves test compatibility and does not weaken runtime least privilege or policy enforcement.

Test / E2E status

Test depth: unit_sufficient — The PR modifies only inline Python parsers inside existing E2E shell tests. Static CI and ShellCheck passed for the head SHA; the E2E Advisor found no required E2E because runtime user flows, sandbox behavior, credentials, security policy, and inference routing implementation are unchanged.
E2E Advisor: ok

✅ What looks good

The patch is minimal: five insertions and five deletions across the exact five failing E2E scripts.
The fallback expression is backwards-compatible with the old nested result.payloads envelope and supports the new top-level payloads envelope.
Required CI contexts passed for the reviewed head SHA.
The change does not modify production sandbox, network, credential, workflow, Dockerfile, installer, or policy code.
The existing tests retain fail-closed checks around provider/transport errors and security-sensitive routing assertions.

Review completeness

This advisory review is based on trusted metadata and the provided diff; no scripts, tests, package-manager commands, or workflows were executed.
Issue nightly-e2e: agent reply JSON extraction broken by OpenClaw schema change (5 jobs) #4031 has no comments in the trusted context, so acceptance extraction is limited to the issue body.
The raw nightly failure logs referenced by issue nightly-e2e: agent reply JSON extraction broken by OpenClaw schema change (5 jobs) #4031 were not independently inspected here; clauses about raw payload output are therefore marked unknown where appropriate.
The PR is draft and CodeRabbit skipped review, so there may be future review feedback not present in the current context.
Open PR chore: upgrade agent runtime dependencies #3925 overlaps all changed files and may alter the final merge conflict/drift assessment.
Human maintainer review required: yes

wscurran · 2026-05-22T14:27:32Z

✨
Related open issues:

#4031 nightly-e2e: agent reply JSON extraction broken by OpenClaw schema change (5 jobs)

hunglp6d mentioned this pull request May 22, 2026

nightly-e2e: agent reply JSON extraction broken by OpenClaw schema change (5 jobs) #4031

Open

wscurran added E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps fix labels May 22, 2026

wscurran mentioned this pull request May 22, 2026

nightly-e2e: Kimi trajectory check rejects combined semicolon command #4034

Open

Conversation

hunglp6d commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Changes

Nightly run

Validation

Test plan

Uh oh!

copy-pr-bot Bot commented May 22, 2026

Uh oh!

coderabbitai Bot commented May 22, 2026

Review skipped

Uh oh!

github-actions Bot commented May 22, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 22, 2026

PR Review Advisor

PR Review Advisor

Gate status

🔴 Blockers

🟡 Warnings

🔵 Suggestions

Acceptance coverage

Security review

Test / E2E status

✅ What looks good

Review completeness

Uh oh!

wscurran commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hunglp6d commented May 22, 2026 •

edited

Loading