Skip to content

fix(e2e): handle top-level payloads in openclaw agent JSON output#4030

Draft
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-agent-reply-json-extract-74c0246
Draft

fix(e2e): handle top-level payloads in openclaw agent JSON output#4030
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-agent-reply-json-extract-74c0246

Conversation

@hunglp6d
Copy link
Copy Markdown
Contributor

@hunglp6d hunglp6d commented May 22, 2026

Fixes #4031

Summary

Five nightly E2E tests fail because their inline Python extractors assume openclaw agent --json output nests payloads under result.payloads, but OpenClaw ≥ 2026.5.18 now returns payloads at the top level of the JSON envelope. The fix makes the extraction resilient to both schemas: doc.get("result") or doc falls back to the document root when there is no result wrapper.

Root cause

OpenClaw's agent JSON output changed from:

{"result": {"payloads": [{"text": "..."}]}}

to:

{"payloads": [{"text": "..."}]}

The E2E test scripts hardcoded doc.get("result") or {} which yields {} under the new schema, so .get("payloads") always returns [] → empty reply → assertion failure.

Changes

File Line Change
test/e2e/test-bedrock-runtime-compatible-anthropic.sh 713 (doc.get("result") or {}).get("payloads")(doc.get("result") or doc).get("payloads")
test/e2e/test-launchable-smoke.sh 531 doc.get('result') or {}doc.get('result') or doc
test/e2e/test-messaging-compatible-endpoint.sh 534 doc.get('result') or {}doc.get('result') or doc
test/e2e/test-openclaw-inference-switch.sh 284 doc.get("result") or {}doc.get("result") or doc
test/e2e/test-sandbox-operations.sh 393 doc.get('result') or {}doc.get('result') or doc

Nightly run

  • Run: https://github.com/NVIDIA/NemoClaw/actions/runs/26260886472
  • Failed jobs (5 of 9, this PR): sandbox-operations-e2e, bedrock-runtime-compatible-anthropic-e2e, messaging-compatible-endpoint-e2e, openclaw-inference-switch-e2e, launchable-smoke-e2e
  • Remaining 4 failures (unrelated, tracked separately): RCF patch compat, gateway PID pattern, Kimi trajectory semicolon, Slack API 404

Validation

The extraction pattern doc.get("result") or doc is backwards-compatible:

  • Old schema ({"result": {"payloads": [...]}}): doc.get("result") returns truthy dict → uses it → .get("payloads") works
  • New schema ({"payloads": [...]}): doc.get("result") returns None → falls back to doc.get("payloads") works

Custom E2E workflow validation was not run (PAT lacks workflow scope to push .github/workflows/ changes). The fix is a minimal, mechanically-verified single-expression change across 5 files.

Test plan

  • Verify nightly E2E passes after merge (all 5 affected jobs should go green)
  • Confirm the fix is backwards-compatible if OpenClaw reverts the schema change

Signed-off-by: Hung Le hple@nvidia.com

OpenClaw 2026.5.18 moved the `payloads` array from `result.payloads`
to the top level of the `openclaw agent --json` output object. Five
E2E tests extracted the reply via `doc.get("result") or {}` which now
returns an empty dict, causing `result.get("payloads")` to yield no
payloads and every agent-reply assertion to see an empty string.

Change the fallback to `doc.get("result") or doc` so the extraction
works with both the old nested schema and the new flat schema.

Affected tests:
  - test-bedrock-runtime-compatible-anthropic.sh
  - test-messaging-compatible-endpoint.sh
  - test-openclaw-inference-switch.sh
  - test-launchable-smoke.sh
  - test-sandbox-operations.sh

Signed-off-by: Hung Le <hple@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c9347852-b29a-4905-a4bb-bf67cf53c2ff

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/nightly-e2e-agent-reply-json-extract-74c0246

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: bedrock-runtime-compatible-anthropic-e2e, launchable-smoke-e2e, messaging-compatible-endpoint-e2e, openclaw-inference-switch-e2e, sandbox-operations-e2e

Dispatch hint: bedrock-runtime-compatible-anthropic-e2e,launchable-smoke-e2e,messaging-compatible-endpoint-e2e,openclaw-inference-switch-e2e,sandbox-operations-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No merge-blocking E2E is required because this PR changes only existing E2E test harness assertions/parsers and cannot affect runtime user flows, sandbox behavior, credentials, security policy, or inference routing implementation.

Optional E2E

  • bedrock-runtime-compatible-anthropic-e2e (high): Optional validation of the modified Bedrock Runtime OpenClaw agent JSON parsing path under the existing hermetic Bedrock-compatible scenario.
  • launchable-smoke-e2e (high): Optional validation that the launchable install-flow smoke test still parses OpenClaw agent JSON correctly after the test-harness change.
  • messaging-compatible-endpoint-e2e (medium): Optional validation of the modified OpenClaw agent response parsing in the Telegram plus OpenAI-compatible endpoint E2E harness.
  • openclaw-inference-switch-e2e (high): Optional validation that the inference-switch E2E harness still extracts agent payload text correctly after switching routes.
  • sandbox-operations-e2e (high): Optional validation of the sandbox operations test's modified Connect & Chat JSON parsing assertion.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: bedrock-runtime-compatible-anthropic-e2e,launchable-smoke-e2e,messaging-compatible-endpoint-e2e,openclaw-inference-switch-e2e,sandbox-operations-e2e

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Recommendation: blocked
Confidence: high
Analyzed HEAD: 6886ac3efcc16191922e1b27b08f40bbfb70ecd0
Findings: 1 blocker(s), 2 warning(s), 1 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: This advisory review is based on trusted metadata and the provided diff; no scripts, tests, package-manager commands, or workflows were executed.; Issue #4031 has no comments in the trusted context, so acceptance extraction is limited to the issue body.; The raw nightly failure logs referenced by issue #4031 were not independently inspected here; clauses about raw payload output are therefore marked unknown where appropriate.; The PR is draft and CodeRabbit skipped review, so there may be future review feedback not present in the current context.; Open PR #3925 overlaps all changed files and may alter the final merge conflict/drift assessment.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: 6886ac3efcc16191922e1b27b08f40bbfb70ecd0
Recommendation: blocked
Confidence: high

The five-line E2E parser compatibility change matches the linked issue, but the PR is still draft/mergeState BLOCKED and overlaps active work in PR #3925.

Gate status

  • CI: pass — Head SHA 6886ac3: required contexts checks, commit-lint, dco-check, check-hash, and changes completed successfully; no required failures reported.
  • Mergeability: fail — GitHub reports isDraft=true, reviewDecision=REVIEW_REQUIRED, and mergeStateStatus=BLOCKED for PR fix(e2e): handle top-level payloads in openclaw agent JSON output #4030.
  • Review threads: pass — GraphQL reviewThreads.nodes is empty. CodeRabbit did not produce review threads because review was skipped for draft state.
  • Risky code tested: warning — Risky areas detected by trusted context: credentials/inference/network. The patch changes only E2E test harness parsers, and the E2E Advisor marked live jobs optional rather than required, but the affected optional E2E jobs are not reported as passed for this head SHA.

🔴 Blockers

  • PR is not currently mergeable: The change appears mechanically scoped, but GitHub marks the PR as draft with mergeStateStatus=BLOCKED and reviewDecision=REVIEW_REQUIRED.
    • Recommendation: Do not merge until the PR is ready for review, branch protection requirements are satisfied, and GitHub reports an unblocked merge state.
    • Evidence: Trusted GitHub context: isDraft=true, reviewDecision=REVIEW_REQUIRED, mergeStateStatus=BLOCKED.

🟡 Warnings

  • Active open PR overlaps all changed files: Open PR chore: upgrade agent runtime dependencies #3925, titled "chore: upgrade agent runtime dependencies", touches the same five E2E scripts. This creates drift/rebase risk because the current parser fix may need to be preserved across dependency-upgrade changes.
  • Affected live E2E jobs are optional and not shown as passed for this head SHA: The E2E Advisor says no merge-blocking E2E is required because only E2E harness parser code changed, but it recommends the five affected jobs as optional validation. The PR body also leaves nightly E2E validation unchecked.
    • Recommendation: If maintainers want direct proof that the nightly failures are fixed before merge, run the optional affected E2E jobs for head SHA 6886ac3 or explicitly accept that static review is sufficient for this test-only patch.
    • Evidence: E2E Advisor optional jobs: bedrock-runtime-compatible-anthropic-e2e, launchable-smoke-e2e, messaging-compatible-endpoint-e2e, openclaw-inference-switch-e2e, sandbox-operations-e2e. PR body test plan still has unchecked items to verify nightly E2E passes after merge.

🔵 Suggestions

  • Comment still describes only result.payloads (test/e2e/test-sandbox-operations.sh): The TC-SBX-02 comment says the assertion is on result.payloads[].text, but the parser now intentionally supports either result.payloads or top-level payloads.
    • Recommendation: Update the nearby comment to describe both supported OpenClaw JSON envelope shapes so future maintainers do not regress the fallback.
    • Evidence: Diff changes result = doc.get('result') or {} to result = doc.get('result') or doc, while the existing comment still references result.payloads[].text.

Acceptance coverage

  • met — ## Nightly E2E failure: agent reply JSON extraction broken by OpenClaw schema change: The diff changes only the agent reply JSON extraction in five E2E scripts, replacing result-only lookup with a fallback to the document root.
  • metAffected jobs (5):: All five affected E2E scripts named by the issue are present in changedFiles.
  • metsandbox-operations-e2e (job 77293833899): test/e2e/test-sandbox-operations.sh changes the inline parser from doc.get('result') or {} to doc.get('result') or doc.
  • metbedrock-runtime-compatible-anthropic-e2e (job 77293833749): test/e2e/test-bedrock-runtime-compatible-anthropic.sh changes payload extraction from (doc.get("result") or {}).get("payloads") to (doc.get("result") or doc).get("payloads").
  • metmessaging-compatible-endpoint-e2e (job 77293833829): test/e2e/test-messaging-compatible-endpoint.sh changes the inline parser to fall back from missing result to doc.
  • metopenclaw-inference-switch-e2e (job 77293833863): test/e2e/test-openclaw-inference-switch.sh changes result = doc.get("result") or {} to result = doc.get("result") or doc.
  • metlaunchable-smoke-e2e (job 77293833838): test/e2e/test-launchable-smoke.sh changes result = doc.get('result') or {} to result = doc.get('result') or doc.
  • metBug group: agent-reply-json-extract: Every changed hunk is in inline Python that extracts agent reply payload text from openclaw agent --json output.
  • partialFailure class: config_error: The patch addresses test-harness schema handling rather than runtime inference code. No live rerun evidence for the failed nightly jobs is present for this head SHA.
  • metConfidence: high: The change is mechanically identical across five parser sites and exactly matches the described schema difference.
  • met — OpenClaw ≥ 2026.5.18 changed its openclaw agent --json output from nesting payloads under {"result": {"payloads": [...]}} to returning them at the top level {"payloads": [...]}.: The new expression doc.get("result") or doc / doc.get('result') or doc supports both a nested result object and a top-level JSON envelope.
  • met — Five E2E test scripts use inline Python extractors that hardcode doc.get("result") or {}, which returns {} under the new schema — so .get("payloads") always yields [] → empty reply → assertion failure.: The diff removes this exact doc.get(...result...) or {} pattern in all five scripts and replaces it with a fallback to doc.
  • unknown — Raw agent output contains correct payload text (e.g., "42", "PONG") at top level: This is issue-provided failure-run evidence. The diff is consistent with it, but the current review context does not include the raw logs themselves.
  • met — Python extractor returns reply='' because it looks only in doc["result"]: The removed parser code used doc.get('result') or {} / doc.get("result") or {}, which produces no payloads when only top-level payloads exists.
  • unknown — Assertion fails on empty reply: The assertion paths still fail on empty reply, and the patch changes the parsing path that produces the reply, but this review did not execute the failing jobs.
  • metPR: fix(e2e): handle top-level payloads in openclaw agent JSON output #4030: The reviewed PR is fix(e2e): handle top-level payloads in openclaw agent JSON output #4030 and its diff contains the described one-line changes.
  • met — The one-line fix in each script changes doc.get("result") or {}doc.get("result") or doc, which falls back to the document root when the result wrapper is absent.: All five hunks implement the fallback-to-root expression, with equivalent single-quoted variants in three scripts.
  • met — Backwards-compatible with the old schema.: When doc.get('result') returns a truthy result object, the parser still reads result.get('payloads'); when it is absent, it reads doc.get('payloads').
  • mettest/e2e/test-bedrock-runtime-compatible-anthropic.sh (line 713): Changed at the OpenClaw agent reply parser: (doc.get("result") or doc).get("payloads").
  • mettest/e2e/test-launchable-smoke.sh (line 531): Changed at the agent reply parser: result = doc.get('result') or doc.

Security review

  • pass — Category 1: Secrets and Credentials: No new hardcoded secrets, tokens, credential files, or credential logging were introduced. Existing fake/test token strings in unchanged context are not modified by this PR.
  • pass — Category 2: Input Validation and Data Sanitization: The only changed input handling is test-harness JSON envelope parsing for trusted openclaw agent --json output. The fallback broadens accepted schema from nested result.payloads to top-level payloads; it does not affect production input validation, command construction, URL parsing, SSRF checks, or shell quoting.
  • pass — Category 3: Authentication and Authorization: No authentication or authorization logic is changed. The tests continue to exercise existing inference and sandbox paths without modifying auth enforcement.
  • pass — Category 4: Dependencies and Third-Party Libraries: No dependency manifests, installers, package versions, registries, or third-party library usage are changed.
  • pass — Category 5: Error Handling and Logging: No new logging or error disclosure paths are introduced. Existing parser failures remain contained to E2E assertion failures with stderr suppressed in the inline parser invocations.
  • pass — Category 6: Cryptography and Data Protection: Not applicable — no cryptographic operations, key handling, encryption settings, or data-protection behavior are changed.
  • pass — Category 7: Configuration and Security Headers: No runtime configuration, container image, security header, CORS, Dockerfile, network policy, or sandbox policy is changed.
  • pass — Category 8: Security Testing: The patch preserves existing E2E security-sensitive assertions, including provider/transport error checks and hop-header leak checks in the surrounding tests, while fixing their ability to parse the new OpenClaw JSON envelope. Live affected E2E jobs are optional rather than required per E2E Advisor.
  • pass — Category 9: Holistic Security Posture: No production sandbox, SSRF, credential, blueprint, installer, or workflow trusted-code boundary is changed. The change improves test compatibility and does not weaken runtime least privilege or policy enforcement.

Test / E2E status

  • Test depth: unit_sufficient — The PR modifies only inline Python parsers inside existing E2E shell tests. Static CI and ShellCheck passed for the head SHA; the E2E Advisor found no required E2E because runtime user flows, sandbox behavior, credentials, security policy, and inference routing implementation are unchanged.
  • E2E Advisor: ok

✅ What looks good

  • The patch is minimal: five insertions and five deletions across the exact five failing E2E scripts.
  • The fallback expression is backwards-compatible with the old nested result.payloads envelope and supports the new top-level payloads envelope.
  • Required CI contexts passed for the reviewed head SHA.
  • The change does not modify production sandbox, network, credential, workflow, Dockerfile, installer, or policy code.
  • The existing tests retain fail-closed checks around provider/transport errors and security-sensitive routing assertions.

Review completeness

@wscurran wscurran added E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps fix labels May 22, 2026
@wscurran
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nightly-e2e: agent reply JSON extraction broken by OpenClaw schema change (5 jobs)

2 participants