Autonomous agent harness improvements for reliable full-stack builds#52
Open
KBB99 wants to merge 39 commits intoanthropics:mainfrom
Open
Autonomous agent harness improvements for reliable full-stack builds#52KBB99 wants to merge 39 commits intoanthropics:mainfrom
KBB99 wants to merge 39 commits intoanthropics:mainfrom
Conversation
Add infrastructure-as-code support so the agent can define serverless backends (Lambda + API Gateway + DynamoDB) via CDK, with CI/CD handling deployment and atomic SSM-based state signaling for async handoff. - Install CDK CLI, AWS CLI v2, esbuild in Dockerfile - Add CDK/AWS security validators (block deploy, allow synth + read-only) - New deploy-infrastructure.yml workflow with atomic SSM JSON state - Update deploy-preview.yml with workflow_call trigger + VITE_API_URL - Add GitHubInfraDeployRole with scoped allowlist + explicit denies - Add AgentCore read-only infra verification permissions - Update prompts and BUILD_PLAN for serverless full-stack architecture - Add infrastructure-aware prompt construction in agent harness - Add API client utility and React Query to frontend scaffold Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update all region references from us-west-2 to us-east-1 - Configure Bedrock model access (CLAUDE_CODE_USE_BEDROCK=1) - Set AgentCore runtime ID (claude_code_reinvent-1eBYMO7kHw) - Make CDK stack resilient to missing AgentCore role (conditional) - Import existing VPC via context to avoid VPC limit - Add account ID suffix to S3 bucket names for uniqueness - Remove backup plan (use EFS automatic backups instead) - Update GitHub repo to KBB99 fork Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent structures monorepo projects with frontend/ workspace, so the Vite build output lands at generated-app/frontend/dist/ instead of generated-app/dist/. Also fix vite config and BrowserRouter patching to check both locations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
workflow_dispatch triggers run from main which doesn't have the generated app. Checkout agent-runtime branch explicitly when not triggered by a push event. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add `make reset` target to wipe all agent state (branch, issues, SSM, S3, CloudFront) for clean restarts - Restructure README.md with Quick Start, Creating a New Project, Resetting the Agent, and Configuration sections - Create CLAUDE.md with architecture overview, PROJECT_NAME flow, CDK context variables, and common issues - Update DEFAULT_MODEL to us.anthropic.claude-opus-4-6-v1 in Makefile and bedrock_entrypoint.py - Add AgentCoreXRayPolicy to CDK stack granting xray:PutTraceSegments and xray:PutTelemetryRecords to fix OTEL trace export 403 errors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hemas The Canopy agent was gravitating toward frontend scaffolding and never building backend/infra. Root cause: the BUILD_PLAN had no phasing and the API spec was just a route list with no request/response schemas. Changes: - Replace <api_specification> with <api_contract> containing full Zod schemas for every entity, inferred types, and a typed endpoint map - Add shared/ workspace package as the single source of truth for types - Enforce phased execution: shared contract → infra + backend → frontend - Remove all Dexie/IndexedDB fallback references (replaced by API-first) - Update monorepo structure, implementation order, and critical paths - Add "Phased Execution" section to system_prompt.txt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The critical step between CDK deploy and agent launch — rebuilding and pushing the Docker image to ECR — was undocumented. Added a "Deploying Changes" section with the full deployment sequence and a table showing which changes require an image rebuild vs CDK deploy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When agent-runtime branch doesn't exist, the agent now clones from BASE_BRANCH (env var, default: main) instead of hardcoded main. This allows testing prompt/code changes on feature branches without merging. Usage: make launch BASE_BRANCH=kb/improved-harness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The SDK env dict only had CLAUDE_CODE_USE_BEDROCK and AWS_REGION, stripping IAM credentials needed for Bedrock auth. This caused "Invalid API key" errors on every call. Now forwards static creds and container credential endpoint env vars. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add permission_mode="bypassPermissions" to ClaudeAgentOptions for
non-interactive container operation. Without this, the CLI defaults
to prompting for interactive permission on each tool use, which fails
in headless environments and may cause the "Invalid API key" errors
we've been seeing.
- Remove redundant AWS credential forwarding from env dict. The SDK
merges env with os.environ ({**os.environ, **env}), so the subprocess
inherits all parent env vars automatically. Only Bedrock-specific
overrides are needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause of "Invalid API key" errors: The Docker container runs as root, and Claude CLI refuses --dangerously-skip-permissions (used by permission_mode="bypassPermissions") when running as root/sudo. Fixes: - Add non-root 'agent' user to Dockerfile and switch to it - Add permission_mode="bypassPermissions" to ClaudeAgentOptions for non-interactive autonomous operation - Fix AWS CLI install URL to auto-detect ARM64 vs x86_64 architecture - Update CLAUDE.md to document --platform linux/arm64 for docker build Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The handler's async generator monitoring loop may not execute if the AgentCore framework stops consuming the generator after the initial streaming response. This left the post-commit hook as the only push mechanism, which can silently fail. Added a polling push loop directly in run_agent_background() that runs every PUSH_INTERVAL_SECONDS (default 300s) in the same thread as the agent subprocess. This ensures commits get pushed regardless of whether the handler's generator is consumed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause of all agent failures (issues #5-#13): the update-runtime-env Makefile target was missing CLAUDE_CODE_USE_BEDROCK and AWS_REGION from the environment-variables JSON. Every time we ran `make update-runtime-env`, those critical vars were wiped, causing Claude CLI to try the Anthropic API (not Bedrock) and fail silently. Also adds Python logging to bedrock_entrypoint.py so agent subprocess output is captured by OTEL auto-instrumentation and visible in CloudWatch. Previously all print() output was invisible. Changes: - Makefile: Add CLAUDE_CODE_USE_BEDROCK=1 and AWS_REGION to update-runtime-env - bedrock_entrypoint.py: Replace print() with logging.getLogger() in critical paths, pipe subprocess stdout through logger for OTEL visibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds sections covering: what's working (full pipeline), what's not (agent ignores phased execution), key config values, step-by-step quick start for running a new test, monitoring commands, and pitfalls discovered during debugging (env var wipe, print vs logging, ARM64, non-root requirement, commit timing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ld order The agent was skipping shared/ and backend/ phases because the system prompt incentivized UI quality and 200+ tests upfront, causing it to build a frontend-only app. This replaces grading language with phase-gate evaluation, reduces test count from 200 to ~50 weighted toward backend/infra, adds the Phased Execution section to the canopy prompt, and simplifies test verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…100% frontend test bias The agent wrote 220 tests that were ALL frontend UI tests because the security hook required a Playwright screenshot to mark any test as passing. With no equivalent verification path for backend tests, the agent rationally skipped non-frontend tests entirely. Changes: - Add backend-verify.cjs scaffold script for shared/infra/backend test verification - Add alternative -result.txt verification path in security.py (alongside screenshot path) - Allowlist AWS data-plane commands (DynamoDB scan/query, CloudWatch logs, Lambda invoke) - Add IAM policy for canopy-* scoped DynamoDB, CloudWatch Logs, and Lambda access - Add Backend Test Verification section and Resource Naming Convention to prompts - Update initial/continuation messages with backend-verify.cjs guidance and VITE_API_URL wiring - Block Write tool from forging -result.txt files (must use backend-verify.cjs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CDK deploy was attaching IAM policies (Secrets Manager, SSM, CloudWatch, etc.) to AmazonBedrockAgentCoreSDKRuntime instead of claude-code-agentcore-role, which is the role the container actually assumes. This caused the container to silently crash on startup when it couldn't access Secrets Manager for the GitHub token. - Add AGENTCORE_ROLE_NAME and VPC_ID as Makefile variables with defaults - Wire them into deploy-infra target so make deploy-infra just works - Document the pitfall in CLAUDE.md lessons learned Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The workflow failed with "Cannot find asset at generated-app/backend/dist" because CDK references backend Lambda code as a bundled asset but the workflow never built the shared or backend packages before running tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent was writing CDK infrastructure and backend handlers simultaneously, then never checking if CI/CD deployed the stack. The frontend was never wired to the real API. Now: - Phase 2 split into 2a (CDK + stubs, commit) and 2b (wait for deploy, then implement real handlers) - Agent must poll SSM deploy-state and wait for "succeeded" before writing full backend handlers or frontend API calls - VITE_API_URL wiring is part of Phase 3 - CLAUDE.md updated with current state (issue #22 success, remaining issues) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r polished frontend Update CLAUDE.md to reflect prompt changes from 3744400 and note next step is a live agent test. Update grading in both system prompts to explicitly call out that localStorage fallback = incomplete and deployed API = highest score, reinforcing the CDK-first deployment flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…builds deploy-infrastructure: Replace hardcoded npm ci with lockfile-aware fallback (npm ci if lock exists, npm install otherwise) for infrastructure, shared, and backend install steps. The agent doesn't generate lockfiles. deploy-preview: Add shared package build step before frontend build (frontend imports from @canopy/shared). Add fallback to build frontend/ directly when root-level npm run build fails (Vite can't resolve index.html via workspace delegation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ructure The "Signal deploy failed" handler runs on failure() but previously had no AWS credentials when build/test steps failed (credentials were configured after synth). Now credentials are configured right after Node setup so the failure handler can always write status to SSM. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove separate shared/backend build steps from deploy-infrastructure workflow. CDK's NodejsFunction bundles with esbuild at synth time, resolving @canopy/shared imports automatically. Update prompts to explicitly require NodejsFunction (not lambda.Function with Code.fromAsset) and ensure shared/package.json has a build script. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Vite resolves workspace package imports directly via esbuild. No need to pre-build shared/ before the frontend build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Early agent commits (shared/, infrastructure/) trigger deploy-preview but have no buildable frontend. Instead of failing, skip the deploy and let later commits with frontend code trigger a successful build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion bundling The agent's Lambda handlers imported uuid which caused esbuild bundling failures in CI/CD (Docker fallback can't resolve node_modules). Prompts now instruct the agent to use Node.js built-in crypto.randomUUID() and configure NodejsFunction to only externalize @aws-sdk/*. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent polls SSM for deploy status but previously only saw "status":"failed" with no details. Now the deploy-infrastructure workflow captures build/test/synth/deploy output and includes the last 40 lines in the SSM error field on failure. Prompts updated to tell the agent to read the error field and fix the issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ts-node has path resolution issues on GitHub Actions runners where npx picks up a cached version instead of the project-local one, causing "Cannot find module ./canopy.ts" errors during cdk deploy. tsx is a drop-in replacement that handles this reliably. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When CDK deploy fails, the outputs file isn't created, causing a jq error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ting projects Enhancement mode (triggered when agent-runtime already has generated-app/) crashed with FileNotFoundError on system_prompt.txt because: 1. claude_code.py only ran setup_session_prompts() for fresh builds, so generated-app/prompts/ was never created in enhancement mode 2. bedrock_entrypoint.py didn't pass --project in enhancement mode, so prompts_dir resolved to the wrong directory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The deploy-preview workflow was building the frontend with an empty VITE_API_URL because the "Resolve API URL" step ran before AWS credentials were configured. The raw IAM user creds lack cloudformation:DescribeStacks, so the call silently failed and Vite inlined shouldUseApi() as `return false`, causing the entire app to fall back to localStorage. - Move "Configure AWS credentials" before "Resolve API URL" and "Build" - Add cloudformation:DescribeStacks to the preview deploy IAM role Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add read-only commands so the agent can diagnose deployment issues instead of spinning in an SSM poll loop when deploys stall: - cloudformation describe-stack-events - cloudformation describe-stack-resources - ssm get-parameters-by-path - lambda get-function-url-config Also update CLAUDE.md with issue #26-#28 postmortems and lessons learned. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `| tee` pipes in install, test, synth, and deploy steps were swallowing non-zero exit codes — tee always exits 0, so cdk deploy failures didn't fail the step, which meant `failure()` was false and "Signal deploy failed" never fired. SSM stayed stuck at "deploying" and the agent polled forever. Adding `set -o pipefail` ensures pipeline exit codes propagate correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent generates CDK code from scratch each run, so construct IDs were inconsistent (e.g. CanopyMainTable vs CanopyTable). CloudFormation sees different logical IDs as new resources and fails with "already exists" when the physical name matches an existing resource. Pin the exact construct IDs and route structure to match the deployed stack so incremental CDK updates work across runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CF stack persisted across resets because make reset only deleted the agent-runtime branch, not the deployed infrastructure. The next agent run would generate fresh CDK code with different construct IDs, causing "already exists" errors on deploy. Now runs cdk destroy first (with cloudformation delete-stack fallback), so each fresh run starts with a clean slate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Not needed — make reset now destroys the CF stack, so each fresh run starts clean and the agent can use whatever construct IDs it wants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a user starts a Claude Code session without a specific task, CLAUDE.md now instructs Claude to run an interactive onboarding flow: check prerequisites, then either deploy Canopy or walk through creating a new project with a BUILD_PLAN.md generated from a stack-agnostic template. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Harness & Orchestration Changes
setup_session_prompts()always runs and--projectis passed in enhancement mode, fixing a crash whenagent-runtimealready has codebackend-verify.cjsand an alternative verification path insrc/security.pyso the agent can verify backend tests without requiring Playwright screenshotserrorfield so the agent can read failures and self-correctNodejsFunction(notlambda.Function),tsx(notts-node), andcrypto.randomUUID()(notuuidpackage) to avoid CI/CD failuresCI/CD Workflow Fixes
package-lock.json(fall back tonpm installfromnpm ci)NodejsFunctionso CDK bundles at synth time, eliminating pre-built asset requirementsdeploy-previewdeploy-previewgracefully when no frontend exists yetdeploy-infrastructure(shared before CDK)Agent Runtime Improvements
BASE_BRANCHfor workspace cloningprint()withlogging)Infrastructure
AgentCoreBackendTestPolicyto CDK stack for backend test verification (DynamoDB, CloudWatch, Lambda permissions scoped tocanopy-*)CLAUDE.mdwith project intelligence, quick start, and lessons learnedFiles Changed (17 files, +1980 / -482)
deploy-infrastructure.yml,deploy-preview.ymlbedrock_entrypoint.py,claude_code.py,src/security.py,src/config.pyprompts/system_prompt.txt,prompts/canopy/BUILD_PLAN.md,prompts/canopy/system_prompt.txt,prompts/canopy/EXAMPLE_TEST.txtinfrastructure/lib/claude-code-stack.tsfrontend-scaffold-template/backend-verify.cjsDockerfile,Makefile,README.md,CLAUDE.mdTest Plan
make resetfollowed by a fresh issue trigger to verify the full pipeline🤖 Generated with Claude Code