Autonomous agent harness improvements for reliable full-stack builds by KBB99 · Pull Request #52 · anthropics/riv2025-long-horizon-coding-agent-demo

KBB99 · 2026-02-23T22:16:18Z

Summary

Harden the agent harness (prompts, entrypoint, CI/CD workflows) so the autonomous coding agent reliably builds full-stack apps end-to-end
Fixes discovered across issues Revert PRs #15 and #16 #17–[FEATURE] add fun tshirt sizing #28: build ordering, CI/CD failures, enhancement mode crash, backend test verification, and prompt guidance

Harness & Orchestration Changes

Phase-gate prompt redesign: Replace grading-based incentives with phase-completion gates that enforce build order (shared → infra → backend → frontend), fixing the agent's tendency to skip backend entirely
CDK-first workflow: Update prompts to enforce CDK deployment before backend implementation, ensuring infrastructure is live before the agent writes handlers
Enhancement mode fix: Ensure setup_session_prompts() always runs and --project is passed in enhancement mode, fixing a crash when agent-runtime already has code
Backend test verification: Add backend-verify.cjs and an alternative verification path in src/security.py so the agent can verify backend tests without requiring Playwright screenshots
Error feedback loop: Write build log output into the SSM error field so the agent can read failures and self-correct
Prompt guidance: Enforce NodejsFunction (not lambda.Function), tsx (not ts-node), and crypto.randomUUID() (not uuid package) to avoid CI/CD failures

CI/CD Workflow Fixes

Handle missing package-lock.json (fall back to npm install from npm ci)
Use NodejsFunction so CDK bundles at synth time, eliminating pre-built asset requirements
Resolve API URL with proper AWS credentials in deploy-preview
Skip deploy-preview gracefully when no frontend exists yet
Fix build ordering in deploy-infrastructure (shared before CDK)

Agent Runtime Improvements

Add background push loop for continuous commit pushing
Fix root user restriction and ARM64 AWS CLI in Dockerfile
Forward AWS credentials to Claude SDK subprocess
Add configurable BASE_BRANCH for workspace cloning
Add OTEL-visible logging (replace print() with logging)

Infrastructure

Add AgentCoreBackendTestPolicy to CDK stack for backend test verification (DynamoDB, CloudWatch, Lambda permissions scoped to canopy-*)
Add CLAUDE.md with project intelligence, quick start, and lessons learned

Files Changed (17 files, +1980 / -482)

Area	Files
CI/CD	`deploy-infrastructure.yml`, `deploy-preview.yml`
Agent harness	`bedrock_entrypoint.py`, `claude_code.py`, `src/security.py`, `src/config.py`
Prompts	`prompts/system_prompt.txt`, `prompts/canopy/BUILD_PLAN.md`, `prompts/canopy/system_prompt.txt`, `prompts/canopy/EXAMPLE_TEST.txt`
Infra	`infrastructure/lib/claude-code-stack.ts`
Scaffold	`frontend-scaffold-template/backend-verify.cjs`
Config	`Dockerfile`, `Makefile`, `README.md`, `CLAUDE.md`

Test Plan

Run make reset followed by a fresh issue trigger to verify the full pipeline
Confirm deploy-infrastructure succeeds on first attempt
Confirm deploy-preview produces a working CloudFront URL
Verify agent follows phase-gate ordering (shared → infra → backend → frontend)

🤖 Generated with Claude Code

Add infrastructure-as-code support so the agent can define serverless backends (Lambda + API Gateway + DynamoDB) via CDK, with CI/CD handling deployment and atomic SSM-based state signaling for async handoff. - Install CDK CLI, AWS CLI v2, esbuild in Dockerfile - Add CDK/AWS security validators (block deploy, allow synth + read-only) - New deploy-infrastructure.yml workflow with atomic SSM JSON state - Update deploy-preview.yml with workflow_call trigger + VITE_API_URL - Add GitHubInfraDeployRole with scoped allowlist + explicit denies - Add AgentCore read-only infra verification permissions - Update prompts and BUILD_PLAN for serverless full-stack architecture - Add infrastructure-aware prompt construction in agent harness - Add API client utility and React Query to frontend scaffold Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Update all region references from us-west-2 to us-east-1 - Configure Bedrock model access (CLAUDE_CODE_USE_BEDROCK=1) - Set AgentCore runtime ID (claude_code_reinvent-1eBYMO7kHw) - Make CDK stack resilient to missing AgentCore role (conditional) - Import existing VPC via context to avoid VPC limit - Add account ID suffix to S3 bucket names for uniqueness - Remove backup plan (use EFS automatic backups instead) - Update GitHub repo to KBB99 fork Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The agent structures monorepo projects with frontend/ workspace, so the Vite build output lands at generated-app/frontend/dist/ instead of generated-app/dist/. Also fix vite config and BrowserRouter patching to check both locations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

workflow_dispatch triggers run from main which doesn't have the generated app. Checkout agent-runtime branch explicitly when not triggered by a push event. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add `make reset` target to wipe all agent state (branch, issues, SSM, S3, CloudFront) for clean restarts - Restructure README.md with Quick Start, Creating a New Project, Resetting the Agent, and Configuration sections - Create CLAUDE.md with architecture overview, PROJECT_NAME flow, CDK context variables, and common issues - Update DEFAULT_MODEL to us.anthropic.claude-opus-4-6-v1 in Makefile and bedrock_entrypoint.py - Add AgentCoreXRayPolicy to CDK stack granting xray:PutTraceSegments and xray:PutTelemetryRecords to fix OTEL trace export 403 errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…hemas The Canopy agent was gravitating toward frontend scaffolding and never building backend/infra. Root cause: the BUILD_PLAN had no phasing and the API spec was just a route list with no request/response schemas. Changes: - Replace <api_specification> with <api_contract> containing full Zod schemas for every entity, inferred types, and a typed endpoint map - Add shared/ workspace package as the single source of truth for types - Enforce phased execution: shared contract → infra + backend → frontend - Remove all Dexie/IndexedDB fallback references (replaced by API-first) - Update monorepo structure, implementation order, and critical paths - Add "Phased Execution" section to system_prompt.txt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The critical step between CDK deploy and agent launch — rebuilding and pushing the Docker image to ECR — was undocumented. Added a "Deploying Changes" section with the full deployment sequence and a table showing which changes require an image rebuild vs CDK deploy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When agent-runtime branch doesn't exist, the agent now clones from BASE_BRANCH (env var, default: main) instead of hardcoded main. This allows testing prompt/code changes on feature branches without merging. Usage: make launch BASE_BRANCH=kb/improved-harness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The SDK env dict only had CLAUDE_CODE_USE_BEDROCK and AWS_REGION, stripping IAM credentials needed for Bedrock auth. This caused "Invalid API key" errors on every call. Now forwards static creds and container credential endpoint env vars. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add permission_mode="bypassPermissions" to ClaudeAgentOptions for non-interactive container operation. Without this, the CLI defaults to prompting for interactive permission on each tool use, which fails in headless environments and may cause the "Invalid API key" errors we've been seeing. - Remove redundant AWS credential forwarding from env dict. The SDK merges env with os.environ ({**os.environ, **env}), so the subprocess inherits all parent env vars automatically. Only Bedrock-specific overrides are needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Root cause of "Invalid API key" errors: The Docker container runs as root, and Claude CLI refuses --dangerously-skip-permissions (used by permission_mode="bypassPermissions") when running as root/sudo. Fixes: - Add non-root 'agent' user to Dockerfile and switch to it - Add permission_mode="bypassPermissions" to ClaudeAgentOptions for non-interactive autonomous operation - Fix AWS CLI install URL to auto-detect ARM64 vs x86_64 architecture - Update CLAUDE.md to document --platform linux/arm64 for docker build Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The handler's async generator monitoring loop may not execute if the AgentCore framework stops consuming the generator after the initial streaming response. This left the post-commit hook as the only push mechanism, which can silently fail. Added a polling push loop directly in run_agent_background() that runs every PUSH_INTERVAL_SECONDS (default 300s) in the same thread as the agent subprocess. This ensures commits get pushed regardless of whether the handler's generator is consumed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Root cause of all agent failures (issues #5-#13): the update-runtime-env Makefile target was missing CLAUDE_CODE_USE_BEDROCK and AWS_REGION from the environment-variables JSON. Every time we ran `make update-runtime-env`, those critical vars were wiped, causing Claude CLI to try the Anthropic API (not Bedrock) and fail silently. Also adds Python logging to bedrock_entrypoint.py so agent subprocess output is captured by OTEL auto-instrumentation and visible in CloudWatch. Previously all print() output was invisible. Changes: - Makefile: Add CLAUDE_CODE_USE_BEDROCK=1 and AWS_REGION to update-runtime-env - bedrock_entrypoint.py: Replace print() with logging.getLogger() in critical paths, pipe subprocess stdout through logger for OTEL visibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds sections covering: what's working (full pipeline), what's not (agent ignores phased execution), key config values, step-by-step quick start for running a new test, monitoring commands, and pitfalls discovered during debugging (env var wipe, print vs logging, ARM64, non-root requirement, commit timing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ld order The agent was skipping shared/ and backend/ phases because the system prompt incentivized UI quality and 200+ tests upfront, causing it to build a frontend-only app. This replaces grading language with phase-gate evaluation, reduces test count from 200 to ~50 weighted toward backend/infra, adds the Phased Execution section to the canopy prompt, and simplifies test verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…100% frontend test bias The agent wrote 220 tests that were ALL frontend UI tests because the security hook required a Playwright screenshot to mark any test as passing. With no equivalent verification path for backend tests, the agent rationally skipped non-frontend tests entirely. Changes: - Add backend-verify.cjs scaffold script for shared/infra/backend test verification - Add alternative -result.txt verification path in security.py (alongside screenshot path) - Allowlist AWS data-plane commands (DynamoDB scan/query, CloudWatch logs, Lambda invoke) - Add IAM policy for canopy-* scoped DynamoDB, CloudWatch Logs, and Lambda access - Add Backend Test Verification section and Resource Naming Convention to prompts - Update initial/continuation messages with backend-verify.cjs guidance and VITE_API_URL wiring - Block Write tool from forging -result.txt files (must use backend-verify.cjs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The CDK deploy was attaching IAM policies (Secrets Manager, SSM, CloudWatch, etc.) to AmazonBedrockAgentCoreSDKRuntime instead of claude-code-agentcore-role, which is the role the container actually assumes. This caused the container to silently crash on startup when it couldn't access Secrets Manager for the GitHub token. - Add AGENTCORE_ROLE_NAME and VPC_ID as Makefile variables with defaults - Wire them into deploy-infra target so make deploy-infra just works - Document the pitfall in CLAUDE.md lessons learned Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The workflow failed with "Cannot find asset at generated-app/backend/dist" because CDK references backend Lambda code as a bundled asset but the workflow never built the shared or backend packages before running tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The agent was writing CDK infrastructure and backend handlers simultaneously, then never checking if CI/CD deployed the stack. The frontend was never wired to the real API. Now: - Phase 2 split into 2a (CDK + stubs, commit) and 2b (wait for deploy, then implement real handlers) - Agent must poll SSM deploy-state and wait for "succeeded" before writing full backend handlers or frontend API calls - VITE_API_URL wiring is part of Phase 3 - CLAUDE.md updated with current state (issue #22 success, remaining issues) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r polished frontend Update CLAUDE.md to reflect prompt changes from 3744400 and note next step is a live agent test. Update grading in both system prompts to explicitly call out that localStorage fallback = incomplete and deployed API = highest score, reinforcing the CDK-first deployment flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…builds deploy-infrastructure: Replace hardcoded npm ci with lockfile-aware fallback (npm ci if lock exists, npm install otherwise) for infrastructure, shared, and backend install steps. The agent doesn't generate lockfiles. deploy-preview: Add shared package build step before frontend build (frontend imports from @canopy/shared). Add fallback to build frontend/ directly when root-level npm run build fails (Vite can't resolve index.html via workspace delegation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ructure The "Signal deploy failed" handler runs on failure() but previously had no AWS credentials when build/test steps failed (credentials were configured after synth). Now credentials are configured right after Node setup so the failure handler can always write status to SSM. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove separate shared/backend build steps from deploy-infrastructure workflow. CDK's NodejsFunction bundles with esbuild at synth time, resolving @canopy/shared imports automatically. Update prompts to explicitly require NodejsFunction (not lambda.Function with Code.fromAsset) and ensure shared/package.json has a build script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Vite resolves workspace package imports directly via esbuild. No need to pre-build shared/ before the frontend build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Early agent commits (shared/, infrastructure/) trigger deploy-preview but have no buildable frontend. Instead of failing, skip the deploy and let later commits with frontend code trigger a successful build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion bundling The agent's Lambda handlers imported uuid which caused esbuild bundling failures in CI/CD (Docker fallback can't resolve node_modules). Prompts now instruct the agent to use Node.js built-in crypto.randomUUID() and configure NodejsFunction to only externalize @aws-sdk/*. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The agent polls SSM for deploy status but previously only saw "status":"failed" with no details. Now the deploy-infrastructure workflow captures build/test/synth/deploy output and includes the last 40 lines in the SSM error field on failure. Prompts updated to tell the agent to read the error field and fix the issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ts-node has path resolution issues on GitHub Actions runners where npx picks up a cached version instead of the project-local one, causing "Cannot find module ./canopy.ts" errors during cdk deploy. tsx is a drop-in replacement that handles this reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When CDK deploy fails, the outputs file isn't created, causing a jq error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ting projects Enhancement mode (triggered when agent-runtime already has generated-app/) crashed with FileNotFoundError on system_prompt.txt because: 1. claude_code.py only ran setup_session_prompts() for fresh builds, so generated-app/prompts/ was never created in enhancement mode 2. bedrock_entrypoint.py didn't pass --project in enhancement mode, so prompts_dir resolved to the wrong directory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The deploy-preview workflow was building the frontend with an empty VITE_API_URL because the "Resolve API URL" step ran before AWS credentials were configured. The raw IAM user creds lack cloudformation:DescribeStacks, so the call silently failed and Vite inlined shouldUseApi() as `return false`, causing the entire app to fall back to localStorage. - Move "Configure AWS credentials" before "Resolve API URL" and "Build" - Add cloudformation:DescribeStacks to the preview deploy IAM role Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add read-only commands so the agent can diagnose deployment issues instead of spinning in an SSM poll loop when deploys stall: - cloudformation describe-stack-events - cloudformation describe-stack-resources - ssm get-parameters-by-path - lambda get-function-url-config Also update CLAUDE.md with issue #26-#28 postmortems and lessons learned. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `| tee` pipes in install, test, synth, and deploy steps were swallowing non-zero exit codes — tee always exits 0, so cdk deploy failures didn't fail the step, which meant `failure()` was false and "Signal deploy failed" never fired. SSM stayed stuck at "deploying" and the agent polled forever. Adding `set -o pipefail` ensures pipeline exit codes propagate correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The agent generates CDK code from scratch each run, so construct IDs were inconsistent (e.g. CanopyMainTable vs CanopyTable). CloudFormation sees different logical IDs as new resources and fails with "already exists" when the physical name matches an existing resource. Pin the exact construct IDs and route structure to match the deployed stack so incremental CDK updates work across runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The CF stack persisted across resets because make reset only deleted the agent-runtime branch, not the deployed infrastructure. The next agent run would generate fresh CDK code with different construct IDs, causing "already exists" errors on deploy. Now runs cdk destroy first (with cloudformation delete-stack fallback), so each fresh run starts with a clean slate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Not needed — make reset now destroys the CF stack, so each fresh run starts clean and the agent can use whatever construct IDs it wants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a user starts a Claude Code session without a specific task, CLAUDE.md now instructs Claude to run an interactive onboarding flow: check prerequisites, then either deploy Canopy or walk through creating a new project with a BUILD_PLAN.md generated from a stack-agnostic template. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zealoushacker and others added 30 commits January 30, 2026 07:33

Remove dead DemoViewerStack import from CDK app entry point

05d6fbe

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix f-string escaping for JSON braces in SSM instructions

58ead19

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix deploy-preview checkout to use agent-runtime branch

9732612

workflow_dispatch triggers run from main which doesn't have the generated app. Checkout agent-runtime branch explicitly when not triggered by a push event. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove separate shared build step from deploy-preview workflow

100fe10

Vite resolves workspace package imports directly via esbuild. No need to pre-build shared/ before the frontend build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

KBB99 and others added 9 commits February 20, 2026 16:48

Fix deploy-infrastructure: check cdk-outputs.json exists before reading

4bf9c11

When CDK deploy fails, the outputs file isn't created, causing a jq error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

revert: remove pinned CDK construct IDs from canopy prompt

70c4c31

Not needed — make reset now destroys the CF stack, so each fresh run starts clean and the agent can use whatever construct IDs it wants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autonomous agent harness improvements for reliable full-stack builds#52

Autonomous agent harness improvements for reliable full-stack builds#52
KBB99 wants to merge 39 commits intoanthropics:mainfrom
KBB99:kb/improved-harness

KBB99 commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KBB99 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Harness & Orchestration Changes

CI/CD Workflow Fixes

Agent Runtime Improvements

Infrastructure

Files Changed (17 files, +1980 / -482)

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KBB99 commented Feb 23, 2026 •

edited

Loading