This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
OpenClaw on AgentCore Runtime — a multi-channel AI messaging bot (Telegram, Slack) running as per-user serverless containers on AWS Bedrock AgentCore Runtime. Each user gets their own microVM with workspace persistence. A Router Lambda handles webhook ingestion from Telegram and Slack (text and images), resolves user identity via DynamoDB, and invokes per-user AgentCore sessions. Image uploads are stored in S3 and passed to Bedrock as multimodal content.
- Infrastructure: CDK v2 (Python), 7 stacks
- Runtime: Bedrock AgentCore Runtime (serverless ARM64 container, VPC mode, per-user sessions)
- Channel Ingestion: Router Lambda behind API Gateway HTTP API (Telegram webhook, Slack Events API, image uploads)
- Multimodal: Image upload support — photos downloaded by Router Lambda, stored in S3, fetched by proxy, sent to Bedrock as multimodal content
- Messaging: OpenClaw (Node.js) — headless mode, messages bridged via WebSocket
- Tools & Skills: Built-in tool groups (full profile) + 5 ClawHub skills + 5 custom skills (S3 user files, EventBridge cron, ClawHub manage, API keys, agentcore-browser) + 2 built-in shim tools (web_fetch, web_search)
- Scheduling: EventBridge Scheduler for recurring tasks — cron executor Lambda warms sessions and delivers responses to channels
- Per-User File Storage: S3-backed per-user file isolation via custom
s3-user-filesskill - Workspace Persistence: .openclaw/ directory synced to/from S3 per user
- AI Model: MiniMax M2.1 via Bedrock ConverseStream (configurable via
default_model_idincdk.json, defaultminimax.minimax-m2.1) - Identity: DynamoDB identity table (channel→user mapping, cross-channel binding) + Cognito User Pool
- Observability: CloudWatch dashboards + alarms, Bedrock invocation logging
- Token Monitoring: Lambda + DynamoDB (single-table) + CloudWatch custom metrics
- API Key Management: Dual-mode storage — native file-based (S3-synced) or AWS Secrets Manager (KMS-encrypted, CloudTrail-auditable) via
api-keysskill - Security: VPC endpoints, KMS CMK, Secrets Manager, cdk-nag.
SECURITY.mdis a thin policy pointer;docs/security.mdis the single source of truth for the full security architecture
Telegram webhook / Slack Events API
|
+-----------v-----------+
| Router Lambda | <-- API Gateway HTTP API, async self-invoke
| - User resolution | DynamoDB identity table
| - Session mgmt | Cross-channel binding
| - Channel dispatch |
+-----------+-----------+
|
+-----------v-----------+
| InvokeAgentRuntime | <-- Per-user session (runtimeSessionId)
| (session per user) |
+-----------+-----------+
|
+-----------v-----------+
| AgentCore Runtime | <-- Per-user microVM (ARM64, VPC mode)
| |
| agentcore-contract.js (8080) -- /ping (Healthy), /invocations
| -> boot: pre-fetch secrets from Secrets Manager
| -> first /invocations (parallel):
| 1. Start proxy (18790) + OpenClaw (18789) + restore .openclaw/
| 2. Wait for proxy only (~5s)
| 3. Lightweight agent handles messages immediately
| -> background: OpenClaw starts (~1-2 min)
| -> handoff: once OpenClaw ready, route via WebSocket bridge
| -> SIGTERM: save .openclaw/ to S3
| |
| lightweight-agent.js -- warm-up shim (proxy -> Bedrock, 17 tools: s3-user-files, eventbridge-cron, clawhub-manage, api-keys, web_fetch, web_search)
| agentcore-proxy.js (18790) -- OpenAI -> Bedrock ConverseStream
| OpenClaw Gateway (18789) -- headless, no channels
+-----------+-----------+
|
+-----------v-----------+
| Amazon Bedrock |
| ConverseStream API |
| MiniMax M2.1 |
+-----------------------+
+-----------------------+ +------------------------+
| S3 User Files | | S3 Workspace Sync |
| {namespace}/file.md | | {namespace}/.openclaw/ |
| Via s3-user-files | | Restored on init, |
| skill | | saved periodically |
+-----------------------+ +------------------------+
+------------------------------------------+
| S3 Image Uploads |
| {namespace}/_uploads/img_*.{jpeg,png,...} |
| Router Lambda uploads, proxy fetches |
| for Bedrock multimodal ConverseStream |
+------------------------------------------+
+------------------------------------------------------+
| EventBridge Scheduler (Cron Jobs) |
| |
| openclaw-cron schedule group |
| -> Cron Lambda (openclaw-cron-executor) |
| 1. Warm up user's AgentCore session |
| 2. Send cron message via AgentCore |
| 3. Deliver response to Telegram/Slack |
+------------------------------------------------------+
Supporting: VPC, KMS, Secrets Manager, Cognito,
CloudWatch, DynamoDB, CloudTrail
openclaw-on-agentcore/
app.py # CDK app entry point (7 stacks)
cdk.json # Configuration (model, budgets, sessions, cron)
requirements.txt # Python deps (aws-cdk-lib, cdk-nag)
stacks/
__init__.py # Shared helper (RetentionDays converter)
vpc_stack.py # VPC, subnets, NAT, 7 VPC endpoints, flow logs
security_stack.py # KMS CMK, Secrets Manager, Cognito, optional CloudTrail
agentcore_stack.py # Runtime, WorkloadIdentity, ECR, S3, IAM
router_stack.py # Router Lambda + API Gateway HTTP API + DynamoDB identity
observability_stack.py # Dashboards, alarms, Bedrock logging
token_monitoring_stack.py # Lambda processor, DynamoDB, token analytics
cron_stack.py # EventBridge Scheduler, Cron executor Lambda, IAM
bridge/
Dockerfile # Container image (node:22-slim, ARM64, clawhub skills)
entrypoint.sh # Startup: configure IPv4, start contract server
agentcore-contract.js # AgentCore HTTP contract with hybrid routing (shim + OpenClaw)
lightweight-agent.js # Warm-up agent shim (s3-user-files + eventbridge-cron + clawhub-manage + api-keys tools)
lightweight-agent.test.js # Lightweight agent unit tests (node:test, 110 tests)
agentcore-proxy.js # OpenAI -> Bedrock ConverseStream adapter + Identity + multimodal images
image-support.test.js # Image support unit tests (node:test)
content-extraction.test.js # Content block extraction tests (node:test)
workspace-sync.js # .openclaw/ directory S3 sync (restore/save/periodic)
scoped-credentials.js # Per-user STS session-scoped credentials (S3, Secrets Manager, DynamoDB)
scoped-credentials.test.js # Scoped credentials unit tests (node:test)
workspace-sync.test.js # Workspace sync credential tests (node:test)
force-ipv4.js # DNS patch for Node.js 22 IPv6 issue
skills/
s3-user-files/ # Custom per-user file storage skill (S3-backed)
SKILL.md # OpenClaw skill manifest
common.js # Shared utilities (sanitize, buildKey, validation)
read.js / write.js # Read/write files in user's S3 namespace
list.js / delete.js # List/delete files in user's S3 namespace
eventbridge-cron/ # Cron scheduling skill (EventBridge Scheduler)
SKILL.md # OpenClaw skill manifest
common.js # Shared utilities (schedule group, DynamoDB helpers)
create.js / update.js # Create/update EventBridge schedules
list.js / delete.js # List/delete schedules
clawhub-manage/ # ClawHub skill installer (install/uninstall/list)
SKILL.md # OpenClaw skill manifest
common.js # Skill name validation
install.js / uninstall.js # Install/uninstall ClawHub skills
list.js # List installed skills
api-keys/ # Dual-mode API key management (native + Secrets Manager)
SKILL.md # OpenClaw skill manifest
common.js # Shared validation (userId, keyName)
native.js / secret.js # Native file CRUD / Secrets Manager CRUD
retrieve.js # Unified lookup (SM first, native fallback)
migrate.js # Move keys between backends
agentcore-browser/ # Headless browser skill (optional, enable_browser=true)
SKILL.md # OpenClaw skill manifest
common.js # Session file reader, S3 upload helper
navigate.js # Navigate to URL, return title + content
screenshot.js # Capture PNG screenshot, upload to S3
interact.js # Click, type, scroll, wait on elements
lambda/
token_metrics/index.py # Bedrock log -> DynamoDB + CloudWatch metrics
router/index.py # Webhook router (Telegram + Slack, image uploads)
router/test_image_upload.py # Image upload unit tests (pytest)
router/test_content_extraction.py # Content block extraction tests (pytest)
router/test_markdown_html.py # Markdown-to-HTML conversion tests (pytest)
cron/index.py # Cron executor (warmup, invoke, deliver to channel)
scripts/
setup-telegram.sh # Telegram webhook + admin allowlist (one-step)
setup-slack.sh # Slack Event Subscriptions + admin allowlist
manage-allowlist.sh # Add/remove/list users in the allowlist
tests/
e2e/ # E2E tests (simulated Telegram webhooks + CloudWatch logs)
docs/
architecture.md # Detailed architecture diagrams
architecture-detailed.md # Technical deep-dive (sequence diagrams, container internals, data flows)
security.md # Complete security architecture (single source of truth — threat model, 10 defense layers, operations runbook)
| Stack | Key Resources | Dependencies |
|---|---|---|
| OpenClawVpc | VPC (2 AZ), subnets, NAT, 7 VPC endpoints, flow logs | None |
| OpenClawSecurity | KMS CMK, Secrets Manager (8 secrets incl. webhook validation + feishu), Cognito User Pool, optional CloudTrail | None |
| OpenClawAgentCore | Execution Role, Security Group, S3 bucket (Runtime/Endpoint managed by Starter Toolkit) | Vpc, Security |
| OpenClawRouter | Lambda, API Gateway HTTP API (explicit routes, throttling), DynamoDB identity table | AgentCore, Security |
| OpenClawObservability | Operations dashboard, alarms, SNS, Bedrock invocation logging | None |
| OpenClawTokenMonitoring | DynamoDB (single-table, 4 GSIs), Lambda processor, analytics dashboard | Observability |
| OpenClawCron | EventBridge Scheduler group, Cron executor Lambda, Scheduler IAM role | AgentCore, Router, Security |
Deployment uses a 3-phase hybrid model: CDK for infrastructure, Starter Toolkit for Runtime/container.
# Full deploy via script (recommended)
./scripts/deploy.sh # all 3 phases
./scripts/deploy.sh --phase1 # CDK foundation only
./scripts/deploy.sh --runtime-only # Starter Toolkit only
./scripts/deploy.sh --phase3 # CDK dependent stacks onlysource .venv/bin/activate
export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
export CDK_DEFAULT_REGION=us-west-2
cdk deploy OpenClawVpc OpenClawSecurity OpenClawAgentCore OpenClawObservability --require-approval never# Configure (first time only)
agentcore configure --name openclaw_agent --entrypoint bridge/agentcore-contract.js \
--execution-role <ROLE_ARN> --region us-west-2 --vpc \
--subnets <SUBNET_IDS> --security-groups <SG_ID> \
--deployment-type container --language typescript --non-interactive
# Deploy (builds Docker image locally or via CodeBuild, creates/updates Runtime)
agentcore deploy --agent openclaw_agent --local-build --auto-update-on-conflict \
--env "BEDROCK_MODEL_ID=global.anthropic.claude-opus-4-6-v1" \
--env "S3_USER_FILES_BUCKET=openclaw-user-files-..." ...
# Update cdk.json with runtime_id and runtime_endpoint_id from toolkit outputcdk deploy OpenClawRouter OpenClawCron OpenClawTokenMonitoring --require-approval nevercdk synth # synthesize + cdk-nag checks
cdk diff # preview changes
cdk destroy --all # tear down (does NOT destroy Starter Toolkit resources)
agentcore destroy --agent openclaw_agent # destroy Starter Toolkit resourcesThe setup script registers the webhook and adds you to the allowlist in one step:
./scripts/setup-telegram.shOr manually:
# Get Router API Gateway URL
API_URL=$(aws cloudformation describe-stacks \
--stack-name OpenClawRouter \
--query "Stacks[0].Outputs[?OutputKey=='ApiUrl'].OutputValue" \
--output text --region $CDK_DEFAULT_REGION)
# Get webhook secret (for Telegram request validation)
WEBHOOK_SECRET=$(aws secretsmanager get-secret-value \
--secret-id openclaw/webhook-secret \
--region $CDK_DEFAULT_REGION --query SecretString --output text)
# Set up Telegram webhook with secret_token for validation
TELEGRAM_TOKEN=$(aws secretsmanager get-secret-value \
--secret-id openclaw/channels/telegram \
--region $CDK_DEFAULT_REGION --query SecretString --output text)
curl "https://api.telegram.org/bot${TELEGRAM_TOKEN}/setWebhook?url=${API_URL}webhook/telegram&secret_token=${WEBHOOK_SECRET}"
# Add yourself to the allowlist (find your ID via @userinfobot on Telegram)
./scripts/manage-allowlist.sh add telegram:YOUR_TELEGRAM_USER_ID# Store Telegram bot token
aws secretsmanager update-secret \
--secret-id openclaw/channels/telegram \
--secret-string 'BOT_TOKEN' \
--region $CDK_DEFAULT_REGION
# Store Slack credentials (JSON: bot token + signing secret for HMAC validation)
aws secretsmanager update-secret \
--secret-id openclaw/channels/slack \
--secret-string '{"botToken":"xoxb-YOUR-BOT-TOKEN","signingSecret":"YOUR-SIGNING-SECRET"}' \
--region $CDK_DEFAULT_REGION./scripts/setup-slack.shThis displays the webhook URL for Slack Event Subscriptions, prompts for your Slack member ID, and adds you to the allowlist.
./scripts/setup-feishu.shThis displays the webhook URL, guides you through Feishu developer console setup (app creation, permissions, events, publishing), stores credentials in Secrets Manager, and adds you to the allowlist. Store credentials format: {"appId":"...","appSecret":"...","verificationToken":"...","encryptKey":"..."}
The project uses CDK + AgentCore Starter Toolkit hybrid deployment:
- CDK manages infrastructure (VPC, Lambda, DynamoDB, S3, etc.)
- Starter Toolkit (
agentcoreCLI) manages the AgentCore Runtime (container image, lifecycle config)
Config file: .bedrock_agentcore.yaml (in repo root, on deploy/starter-toolkit-hybrid branch)
# 1. Build image locally
docker build --platform linux/arm64 -t openclaw-bridge:v${TAG} bridge/
# 2. Push to ECR (starter toolkit ECR repo, NOT the CDK one)
ECR_REPO=<ACCOUNT_ID>.dkr.ecr.us-west-2.amazonaws.com/bedrock-agentcore-openclaw_agent
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin $ECR_REPO
docker tag openclaw-bridge:v${TAG} ${ECR_REPO}:v${TAG}
docker push ${ECR_REPO}:v${TAG}
# 3. Update AgentCore Runtime to use new image
# CRITICAL: update-agent-runtime is a FULL REPLACE — any field you omit gets cleared!
# You MUST include --environment-variables every time, or all env vars will be wiped.
aws bedrock-agentcore-control update-agent-runtime \
--agent-runtime-id <RUNTIME_ID> \
--agent-runtime-artifact "{\"containerConfiguration\":{\"containerUri\":\"${ECR_REPO}:v${TAG}\"}}" \
--role-arn "arn:aws:iam::<ACCOUNT_ID>:role/openclaw-agentcore-execution-role-us-west-2" \
--network-configuration '{"networkMode":"VPC","networkModeConfig":{"securityGroups":["<SECURITY_GROUP_ID>"],"subnets":["<PRIVATE_SUBNET_1>","<PRIVATE_SUBNET_2>"]}}' \
--environment-variables '{
"AWS_REGION":"us-west-2",
"BEDROCK_AGENTCORE_MEMORY_ID":"<MEMORY_ID>",
"BEDROCK_AGENTCORE_MEMORY_NAME":"openclaw_agent_mem",
"BEDROCK_MODEL_ID":"global.anthropic.claude-opus-4-6-v1",
"COGNITO_CLIENT_ID":"<COGNITO_CLIENT_ID>",
"COGNITO_PASSWORD_SECRET_ID":"openclaw/cognito-password-secret",
"COGNITO_USER_POOL_ID":"<COGNITO_USER_POOL_ID>",
"CRON_LAMBDA_ARN":"arn:aws:lambda:us-west-2:<ACCOUNT_ID>:function:openclaw-cron-executor",
"CRON_LEAD_TIME_MINUTES":"5",
"EVENTBRIDGE_ROLE_ARN":"arn:aws:iam::<ACCOUNT_ID>:role/openclaw-cron-scheduler-role-us-west-2",
"EVENTBRIDGE_SCHEDULE_GROUP":"openclaw-cron",
"EXECUTION_ROLE_ARN":"arn:aws:iam::<ACCOUNT_ID>:role/openclaw-agentcore-execution-role-us-west-2",
"GATEWAY_TOKEN_SECRET_ID":"openclaw/gateway-token",
"IDENTITY_TABLE_NAME":"openclaw-identity",
"S3_USER_FILES_BUCKET":"openclaw-user-files-<ACCOUNT_ID>-us-west-2",
"SUBAGENT_BEDROCK_MODEL_ID":"global.anthropic.claude-opus-4-6-v1"
}' \
--region us-west-2
# 4. Verify update completed
aws bedrock-agentcore-control get-agent-runtime \
--agent-runtime-id <RUNTIME_ID> --region us-west-2 \
--query '{status:status,version:agentRuntimeVersion,image:agentRuntimeArtifact.containerConfiguration.containerUri}'
# 5. New sessions will use the new image automatically (per-user idle termination)# Check runtime status
agentcore status
# Deploy via CodeBuild (cloud build, no local Docker needed)
agentcore deploy -a openclaw_agent --auto-update-on-conflict --image-tag v${TAG}
# Deploy with local build (requires Docker)
agentcore deploy -a openclaw_agent --local-build --auto-update-on-conflict --image-tag v${TAG}
# Stop a runtime session
agentcore stop-session
# Invoke runtime for testing
agentcore invoke -a openclaw_agentRequired when deploying a new image and you need the user's next message to start a fresh session with the new version. Also useful for debugging.
# 1. Get the session ID from DynamoDB (look up by user's internal ID)
aws dynamodb query --table-name openclaw-identity --region us-west-2 \
--key-condition-expression "PK = :pk AND SK = :sk" \
--expression-attribute-values '{":pk":{"S":"USER#<internalUserId>"},":sk":{"S":"SESSION"}}' \
--query 'Items[0].sessionId.S' --output text
# 2. Stop the session via data plane API
aws bedrock-agentcore stop-runtime-session \
--agent-runtime-arn "arn:aws:bedrock-agentcore:us-west-2:<ACCOUNT_ID>:runtime/<RUNTIME_ID>" \
--runtime-session-id "<sessionId>" \
--region us-west-2The session receives SIGTERM, saves workspace to S3, and shuts down. The next message from that user triggers a new session with the latest image.
- Runtime ID:
<RUNTIME_ID> - Runtime ARN:
arn:aws:bedrock-agentcore:us-west-2:<ACCOUNT_ID>:runtime/<RUNTIME_ID> - ECR Repo:
<ACCOUNT_ID>.dkr.ecr.us-west-2.amazonaws.com/bedrock-agentcore-openclaw_agent - Execution Role:
arn:aws:iam::<ACCOUNT_ID>:role/openclaw-agentcore-execution-role-us-west-2 - Control Plane API:
aws bedrock-agentcore-control(get/update/list runtime, endpoints, sessions) - Data Plane API:
aws bedrock-agentcore(invoke-agent-runtime, stop-runtime-session)
# CDK no longer manages the AgentCore Runtime container image.
# Use CDK only for infrastructure changes (Lambda, DynamoDB, VPC, etc.)
source .venv/bin/activate && cdk deploy --all --require-approval never
⚠️ Always run./post-deploy.shafter any CDK deploy. This appliesauto-delete=notags to all OpenClaw resources to prevent automated cleanup processes from deleting production infrastructure. The script is gitignored (internal only) — if it doesn't exist, ask the user to restore it.
cd bridge && node --test proxy-identity.test.js # identity + workspace tests
cd bridge && node --test image-support.test.js # image upload + multimodal tests
cd bridge && node --test lightweight-agent.test.js # lightweight agent tools + buildToolArgs tests
cd bridge && node --test subagent-routing.test.js # subagent model routing + detection tests
cd bridge && node --test content-extraction.test.js # recursive content block extraction tests
cd bridge && node --test scoped-credentials.test.js # per-user STS credential scoping tests
cd bridge && node --test workspace-sync.test.js # workspace sync credential tests
cd bridge && node --test agentcore-browser.test.js # browser skill unit tests
cd bridge && node --test browser-lifecycle.test.js # browser session lifecycle tests
cd bridge/skills/s3-user-files && AWS_REGION=$CDK_DEFAULT_REGION node --test common.test.js # S3 skill testscd lambda/router && python -m pytest test_image_upload.py -v # image upload unit tests
cd lambda/router && python -m pytest test_content_extraction.py -v # content block extraction tests
cd lambda/router && python -m pytest test_markdown_html.py -v # markdown-to-HTML conversion tests
cd lambda/router && python -m pytest test_screenshot_handling.py -v # screenshot marker detection + delivery testscd tests/e2e && python -m pytest bot_test.py -v # simulated Telegram webhook tests (requires deployed stack)
cd tests/e2e && python -m pytest bot_test.py -v -k TestBrowserFeature # browser E2E tests (requires enable_browser=true)# Get runtime status (via Starter Toolkit)
agentcore status --agent openclaw_agent --verbose
# Get container-level status (openclawReady, proxyReady, uptime, logs)
agentcore invoke '{"action":"status"}' -a openclaw_agent
# Send test chat to trigger init + verify end-to-end
agentcore invoke '{"action":"chat","userId":"test","actorId":"test:123","channel":"test","message":"hello"}' -a openclaw_agent
# Stop a specific session (forces new container on next invocation)
# IMPORTANT: update-agent-runtime changes env vars but does NOT replace running containers.
# You must stop-session to force a fresh container with updated env vars.
agentcore stop-session -a openclaw_agent -s <SESSION_ID>
# Find session ID from Router Lambda logs
aws logs filter-log-events --log-group-name /openclaw/lambda/router --region $CDK_DEFAULT_REGION \
--start-time $(python3 -c "import time; print(int((time.time()-3600)*1000))") \
--filter-pattern "session=ses_" --query 'events[-1].message' --output text
# Update runtime config (env vars, image, subnets) — bypasses Starter Toolkit limitations
aws bedrock-agentcore-control update-agent-runtime \
--agent-runtime-id <RUNTIME_ID> \
--role-arn "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/openclaw-agentcore-execution-role-$CDK_DEFAULT_REGION" \
--agent-runtime-artifact '{"containerConfiguration":{"containerUri":"<ECR_URI>:<TAG>"}}' \
--network-configuration '{"networkMode":"VPC","networkModeConfig":{"subnets":["<SUBNET1>","<SUBNET2>"],"securityGroups":["<SG>"]}}' \
--environment-variables '{...}' \
--region $CDK_DEFAULT_REGION
# Check DynamoDB identity table
aws dynamodb scan --table-name openclaw-identity --region $CDK_DEFAULT_REGION
# Check Router Lambda errors (last 5 min)
aws logs filter-log-events --log-group-name /openclaw/lambda/router --region $CDK_DEFAULT_REGION \
--start-time $(python3 -c "import time; print(int((time.time()-300)*1000))") \
--filter-pattern "ERROR" --query 'events[*].message' --output text
# Check ECR images
aws ecr describe-images --repository-name bedrock-agentcore-openclaw_agent --region $CDK_DEFAULT_REGION \
--query 'imageDetails[*].{tag:imageTags[0],size:imageSizeInBytes,pushed:imagePushedAt}' --output table# Build ARM64 image from current branch
cd bridge && sudo docker build --platform linux/arm64 -t openclaw-bridge:latest .
# Test locally before pushing
sudo docker run --rm -d --name test -p 8080:8080 -e AWS_REGION=$CDK_DEFAULT_REGION openclaw-bridge:latest
curl http://localhost:8080/ping # should return {"status":"Healthy",...}
sudo docker stop test
# Login + push to ECR
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
aws ecr get-login-password --region $CDK_DEFAULT_REGION | sudo docker login --username AWS --password-stdin $ACCOUNT.dkr.ecr.$CDK_DEFAULT_REGION.amazonaws.com
sudo docker tag openclaw-bridge:latest $ACCOUNT.dkr.ecr.$CDK_DEFAULT_REGION.amazonaws.com/bedrock-agentcore-openclaw_agent:local-build
sudo docker push $ACCOUNT.dkr.ecr.$CDK_DEFAULT_REGION.amazonaws.com/bedrock-agentcore-openclaw_agent:local-build| Parameter | Default | Description |
|---|---|---|
account |
(empty) | AWS account ID. Falls back to CDK_DEFAULT_ACCOUNT |
region |
us-west-2 |
AWS region. Falls back to CDK_DEFAULT_REGION |
default_model_id |
global.anthropic.claude-opus-4-6-v1 |
Bedrock model ID. The global. prefix routes to any available region |
runtime_id |
(empty) | AgentCore Runtime ID from Starter Toolkit. Populated by deploy script after agentcore deploy |
runtime_endpoint_id |
(empty) | AgentCore Runtime Endpoint ID. Typically DEFAULT when using Starter Toolkit |
image_version |
1 |
Bridge container version tag. Bump to force container redeploy |
cloudwatch_log_retention_days |
30 |
Log retention |
daily_token_budget |
1000000 |
Token budget alarm threshold |
daily_cost_budget_usd |
5 |
Cost budget alarm threshold |
token_ttl_days |
90 |
DynamoDB TTL |
user_files_ttl_days |
365 |
S3 per-user file expiration |
session_idle_timeout |
1800 |
Per-user session idle timeout (seconds) |
session_max_lifetime |
28800 |
Per-user session max lifetime (seconds) |
workspace_sync_interval_seconds |
300 |
.openclaw/ S3 sync interval |
router_lambda_timeout_seconds |
300 |
Router Lambda timeout |
router_lambda_memory_mb |
256 |
Router Lambda memory |
registration_open |
false |
If true, any user can register. If false, only allowlisted users |
cron_lambda_timeout_seconds |
600 |
Cron executor Lambda timeout (must exceed warmup time) |
cron_lambda_memory_mb |
256 |
Cron executor Lambda memory |
enable_cloudtrail |
false |
Deploy a dedicated CloudTrail trail (S3 bucket + trail). Off by default — most accounts already have one |
cron_lead_time_minutes |
5 |
Minutes before schedule time to start warmup |
subagent_model_id |
(empty) | Bedrock model for sub-agents. Empty = use default_model_id |
enable_browser |
false |
Enable headless Chromium browser. Sets BROWSER_IDENTIFIER env var on container |
- entrypoint.sh: Configure Node.js IPv4 DNS patch, start contract server
- agentcore-contract.js (port 8080): Responds to
/pingwithHealthyimmediately - On first
/invocationswithaction: chatoraction: warmup(lazy init):- Fetch secrets from Secrets Manager (gateway token, Cognito secret)
- Create STS scoped credentials restricting S3 + Secrets Manager + DynamoDB to user's namespace
- Configure workspace-sync with scoped credentials
- Start
agentcore-proxy.js(port 18790) withUSER_ID/CHANNELenv vars - Start OpenClaw gateway (port 18789) with scoped credentials env (no container credentials)
- Restore
.openclaw/from S3 viaworkspace-sync.jsin background - Start credential refresh timer (45 min interval)
- If
BROWSER_IDENTIFIERset: create browser session via AgentCore Browser API, write session file to/tmp/agentcore-browser-session.json - Wait for proxy only (~5s)
- Warm-up phase (t=~10s to ~1-2min):
lightweight-agent.jshandles messages via proxy -> Bedrock (supports s3-user-files, eventbridge-cron, clawhub-manage, api-keys, web_fetch, web_search tools) - Handoff (~1-2min): OpenClaw becomes ready, all subsequent messages route via WebSocket bridge
- After handoff: Full OpenClaw features —
web_fetch,web_search(built-in), 5 ClawHub skills (Jina reader, deep-research-pro, etc.), sub-agent support, session management action: warmup: Triggers init only; returns{ready: true}when OpenClaw is ready (used by cron Lambda to pre-warm sessions)action: cron: Sends a cron message via the WebSocket bridge (same as chat but intended for scheduled tasks)action: status: Returns current init state ({openclawReady, proxyReady, uptime}) without triggering init- SIGTERM: Save
.openclaw/to S3, kill child processes, exit
Table: openclaw-identity (PAY_PER_REQUEST, TTL on ttl attribute)
| PK | SK | Purpose |
|---|---|---|
CHANNEL#telegram:123456789 |
PROFILE |
Channel→user lookup |
USER#user_abc123 |
PROFILE |
User profile |
USER#user_abc123 |
CHANNEL#telegram:123456789 |
User's bound channels |
USER#user_abc123 |
SESSION |
Current session |
BIND#ABC123 |
BIND |
Cross-channel bind code (10 min TTL) |
ALLOW#telegram:123456789 |
ALLOW |
User allowlist entry |
USER#user_abc123 |
CRON#schedule-name |
User's cron schedule metadata (expression, message, timezone, channel) |
Cross-channel binding: User says "link accounts" on Telegram → gets 6-char code → enters code on Slack → both channels route to same user/session.
When registration_open is false (default), only users with an ALLOW# record in DynamoDB can register. Existing users (already have a CHANNEL# record) are always allowed. Cross-channel binding bypasses the allowlist since it links to an already-approved user.
Unauthorized users who message the bot receive a rejection message that includes their channel ID (e.g. telegram:123456), so they can share it with the admin for onboarding.
After initial deployment, no users exist. The easiest path is the setup script, which registers the webhook and adds you to the allowlist in one step:
./scripts/setup-telegram.shAlternatively, if you don't know your Telegram user ID:
- Message the bot from Telegram
- The bot replies with a rejection message showing your ID, e.g.
telegram:123456 - Add yourself to the allowlist:
./scripts/manage-allowlist.sh add telegram:123456
- Message the bot again — you are now registered
When someone wants access to the bot:
- They message the bot and receive: "Your ID:
telegram:789012. Send this ID to the bot admin to request access." - The admin adds them:
./scripts/manage-allowlist.sh add telegram:789012
- The user messages the bot again — they are now registered
# Add a user to the allowlist
./scripts/manage-allowlist.sh add telegram:123456
# Remove a user
./scripts/manage-allowlist.sh remove telegram:123456
# List all allowed users
./scripts/manage-allowlist.sh listOnly the first channel identity needs to be allowlisted. When a user binds a second channel (e.g. Slack) via link, the new channel maps to their existing approved user — no separate allowlist entry needed.
- Hybrid deploy: Runtime/Endpoint/ECR managed by Starter Toolkit (
agentcore deploy), not CDK. CDK manages IAM Role, SG, S3, Lambda, etc. See./scripts/deploy.shfor the 3-phase flow - ARM64 required: Build with
--platform linux/arm64. This machine is ARM64 native — use--local-buildmode - Docker Hub rate limit: Dockerfile uses
public.ecr.aws/docker/library/node:22-slim(ECR Public Gallery) instead of Docker Hub to avoid anonymous pull rate limits in CodeBuild - IAM role names are region-suffixed:
openclaw-agentcore-execution-role-{region}andopenclaw-cron-scheduler-role-{region}to avoid cross-region conflicts (IAM roles are global) - Trust policy self-assume: Uses
AccountRootPrincipal()+ArnEqualscondition (notArnPrincipal) to avoid chicken-and-egg during role creation update-agent-runtimeis a FULL REPLACE: Omitting--environment-variableswipes ALL env vars. Always include the full env vars JSON in every update call. This is the most common deployment mistake — the container starts but init fails because secrets/config env vars are missing- Resource names: Must match
^[a-zA-Z][a-zA-Z0-9_]{0,47}$— underscores, not hyphens - Health check timing: Contract server on port 8080 must start within seconds
- Per-user sessions: Contract returns
Healthy(notHealthyBusy) — allows natural idle termination - Session recreation: InvokeAgentRuntime with terminated session creates new microVM; workspace restored on init
- VPC endpoints:
bedrock-agentcore-runtimeendpoint not available in all regions - Starter Toolkit source_path: Must point to
bridge/directory (contains Dockerfile and all COPY sources). If source_path is project root, COPY commands fail because paths are relative to bridge/
- Cross-region inference: Model
minimax.minimax-m2.1uses a global cross-region inference profile that routes to any available region — IAM usesarn:aws:bedrock:*::foundation-model/*and inference-profile wildcards - Inference profile ARN: Separate from foundation model —
arn:aws:bedrock:{region}:{account}:inference-profile/*
- IPv6 issue: Node.js 22 Happy Eyeballs fails in VPCs without IPv6 —
force-ipv4.jspatchesdns.lookup()to force IPv4 - NODE_OPTIONS:
--dns-result-order=ipv4first --no-network-family-autoselection -r /app/force-ipv4.js
logs.RetentionDaysis an enum — use helper instacks/__init__.py- Cross-stack cyclic deps: use string ARN params +
add_to_policy()instead ofgrant_*() - Empty
cdk.jsonaccount: falls back toCDK_DEFAULT_ACCOUNTenv var viaapp.py
- Startup takes ~1-2 minutes (plugin registration); lightweight agent shim handles messages during this time
- Correct start command:
openclaw gateway run --port 18789 --verbose(no--bind lan— localhost binding sufficient since both processes run in the same container) - Tool profile: Uses
"full"profile with a deny list. Do NOT use"basic"(undocumented, may disable web tools). Documented profiles:minimal,coding,messaging,full - Deny list:
["write", "edit", "apply_patch", "read", "browser", "canvas", "cron", "gateway"]— local writes use S3 skill,readblocked to prevent credential reads, no browser/UI in container, EventBridge replaces built-in cron.execis NOT denied — skills likeclawhub-manageneed it; scoped STS credentials limit blast radius - Sub-agent sandbox: Must be
"off"— no Docker inside AgentCore microVMs. MicroVMs already provide per-user isolation - Sub-agent model: Configurable via
SUBAGENT_BEDROCK_MODEL_IDenv var (fromsubagent_model_idin cdk.json). Empty = use same as main model. Subagents use a distinct model name (bedrock-agentcore-subagent) so the proxy can detect and count them separately skipBootstrapremoved: No longer a valid config key — OpenClaw rejects unknown keys and exits with code 1skills.allowBundled: Must be an array (e.g.,[]for none,["*"]for all), not a boolean. Set to[]for fast startup- ClawHub skill paths:
clawhub installinstalls to managed skills path — OpenClaw scans this automatically. Custom skills in/skills/loaded viaextraDirs - ClawHub VirusTotal flags: Some skills flagged for external API calls — use
--no-input --forcefor non-interactive Docker builds - 5 ClawHub skills installed: jina-reader, deep-research-pro, telegram-compose, transcript, task-decomposer (reduced from 8 — duckduckgo-search, hackernews, news-feed removed to optimize cold start; web search handled by lightweight agent's built-in web_search tool)
- Image updates: New sessions use new image automatically (no keepalive restart needed)
- WebSocket bridge protocol: Connect → auth (type:req, method:connect, protocol:3, auth:{token}) → agent.chat → streaming deltas → final
- OpenClaw 2026.3.2 WebSocket origin enforcement: OpenClaw enforces origin checks on all WebSocket connections that carry an
Originheader. ThewsNode.js library must use theoriginoption (notheaders.Origin) to set the header correctly for the HTTP upgrade request. Config:controlUi: { enabled: false, allowInsecureAuth: true, dangerouslyDisableDeviceAuth: true, allowedOrigins: ["*"] }. Without both theoriginoption on the client andallowedOriginsin config, connections fail with "Auth failed: origin not allowed" - Workspace sync overwrites config: The
.openclaw/S3 sync can overwriteopenclaw.jsonwith stale configs.openclaw.jsonis excluded from sync via SKIP_PATTERNS — config is always programmatically generated bywriteOpenClawConfig()
- Self-signup disabled — users auto-provisioned by proxy via
AdminCreateUser - Passwords:
HMAC-SHA256(secret, actorId).slice(0, 32)— deterministic, never stored - Usernames are channel-prefixed:
telegram:123456789 - JWT tokens cached per user with 60s early refresh
- API Gateway HTTP API: Only explicit routes exposed (
POST /webhook/telegram,POST /webhook/slack,GET /health). Rate limiting: burst 50, sustained 100 req/s - Webhook validation: Telegram uses
X-Telegram-Bot-Api-Secret-Tokenheader (set viasecret_tokenonsetWebhook). Slack usesX-Slack-SignatureHMAC-SHA256 with 5-minute replay window - Async dispatch: Self-invokes with
InvocationType=Eventfor actual processing; returns 200 immediately to webhook - Slack: Handles
url_verificationchallenge synchronously; ignores retries viax-slack-retry-numheader - Cold start latency: First message to a new user triggers microVM creation; lightweight agent responds in ~10-15s while OpenClaw starts in background (~1-2 min)
- Typing indicator + progress message: Telegram typing indicator sent every 4s while waiting; after 30s of waiting, a one-time progress message ("Working on your request...") is sent to both Telegram and Slack so users know the bot is still working during long subagent tasks
- Content block extraction:
_extract_text_from_content_blocks()recursively unwraps nested[{"type":"text","text":"..."}]JSON — subagent responses (deep-research-pro, task-decomposer) can wrap content multiple levels deep - Markdown-to-HTML conversion:
_markdown_to_telegram_html()converts markdown to Telegram-compatible HTML before sending. Handles bold, italic, strikethrough, code blocks, inline code, headers, links, blockquotes, horizontal rules, and markdown tables (rendered as monospace<pre>blocks with aligned columns). Usesparse_mode: "HTML"(not"Markdown"v1 which is too strict for AI-generated content) - Cross-channel binding: "link accounts" generates 6-char code in DynamoDB with 10-min TTL
- Image uploads: Telegram photos and Slack file attachments (JPEG, PNG, GIF, WebP, max 3.75 MB) are downloaded by the Router Lambda, uploaded to S3 under
{namespace}/_uploads/, and passed to AgentCore as a structured message{text, images[{s3Key, contentType}]} - Telegram captions:
message.get("text", "") or message.get("caption", "")— photos usecaption, nottext - Secret cache TTL: Secrets Manager values cached for 15 minutes (was indefinite). Rotated secrets reflected within 15 min without container restart
- Router Lambda downloads image from channel API (Telegram
getFile/ Slackurl_private_download), uploads to S3{namespace}/_uploads/img_{ts}_{hex}.{ext} - Contract server converts structured message to bridge text with
[OPENCLAW_IMAGES:[...]]marker appended - Proxy extracts marker via regex, fetches image bytes from S3 (with namespace validation to prevent cross-user reads), builds Bedrock multimodal content blocks (
{image: {format, source: {bytes}}}) - Supported types:
image/jpeg,image/png,image/gif,image/webp(max 3.75 MB per Bedrock limit) - Security: S3 key validated against user's namespace prefix + path traversal (
..) rejection. Format validated againstVALID_BEDROCK_FORMATSset - Slack prerequisite: Bot needs
files:readOAuth scope to download image files
- S3 prefix:
{namespace}/.openclaw/wherenamespace = actorId.replace(/:/g, "_") - Periodic saves: Every 5 min (configurable via
WORKSPACE_SYNC_INTERVAL_MS) - SIGTERM grace: 10s max for final save before exit (AgentCore gives 15s total)
- Skip patterns:
node_modules/,.cache/,*.log, files > 10MB - Same S3 bucket: Uses
S3_USER_FILES_BUCKET(shared with s3-user-files skill)
- Schedule group: All schedules created under
openclaw-crongroup in EventBridge Scheduler - Schedule naming:
openclaw-{namespace}-{shortId}(e.g.,openclaw-telegram_123456789-87a86927) - DynamoDB storage: Schedule metadata stored as
CRON#SK under the user's PK in the identity table - Cron executor Lambda: Warms up the user's AgentCore session (sends
action: warmup), then sends the cron message (sendsaction: cron), then delivers the response to the user's chat channel - Lead time: Cron Lambda invoked with
cron_lead_time_minutes(default 5 min) to allow session warmup before the scheduled time - Environment variables: Container receives
EVENTBRIDGE_SCHEDULE_GROUP,CRON_LAMBDA_ARN,EVENTBRIDGE_ROLE_ARN,IDENTITY_TABLE_NAME,CRON_LEAD_TIME_MINUTESfor the eventbridge-cron skill
- Opt-in: Only active when
enable_browser=trueincdk.jsonandBROWSER_IDENTIFIERis set as container env var - Session file:
/tmp/agentcore-browser-session.json— written by contract server on init, read by skill scripts - Session timeout: 1 hour (
BROWSER_SESSION_TIMEOUT_SECONDS = 3600). Session recreated automatically on expiry - Screenshot delivery: Screenshots uploaded to
{namespace}/_screenshots/in S3, embedded in response as[SCREENSHOT:key]marker. Router Lambda detects markers and delivers as photos to Telegram/Slack - Not available during warm-up: Browser skill requires full OpenClaw startup — the lightweight agent does not include browser tools
- Lifecycle:
startBrowserSession()called during init (parallel with proxy/OpenClaw start),stopBrowserSession()called on SIGTERM - Skill scripts:
navigate.js(CDP Page.navigate),screenshot.js(CDP Page.captureScreenshot → S3 upload),interact.js(click/type/scroll/wait via CDP)
- Priority order: (0)
USER_IDenv var (set by contract server) → (1)x-openclaw-actor-idheader → (2) OpenAIuserfield → (3) message envelope parsing → (4) messagenamefield → (5) fallbackdefault-user - Per-user sessions: Contract server sets
USER_IDenv var when starting proxy, so identity is always resolved from environment in per-user mode - S3-backed isolation: User files in
s3://openclaw-user-files-{account}-{region}/{namespace}/ - Namespace immutability: System-determined from channel identity, cannot be changed by user request
- actorId vs namespace: actorId uses colon format (
telegram:123456789), namespace uses underscore format (telegram_123456789). Skill scripts (s3-user-files, eventbridge-cron) expect namespace format. The lightweight agent'schat()converts viauserId.replace(/:/g, "_")before passing to tools. The proxy and workspace sync also use namespace format for S3 keys
- STS session-scoped credentials: On init, the contract server calls
STS:AssumeRoleon the execution role with a minimal session policy that restricts S3 access to{namespace}/*. Other services (DynamoDB, Scheduler, SecretsManager) useResource: "*"in the session policy — the execution role's own policy provides the actual resource-level restrictions. This design keeps the session policy under the AWS 2048-byte packed limit (long policies with per-resource Conditions easily exceed this) - Session policy size limit: AWS STS
AssumeRolesession policies have a 2048-byte packed limit. If exceeded,AssumeRolefails with "Packed policy consumes N% of allotted space". The current policy is ~668 bytes (well under limit). Adding Condition blocks (e.g.,dynamodb:LeadingKeys,s3:prefix) quickly blows past the limit — avoid them in session policies - Credential files: Scoped credentials written to
/tmp/scoped-creds/incredential_processformat. OpenClaw usesAWS_CONFIG_FILE+AWS_SDK_LOAD_CONFIG=1to pick them up - OpenClaw env isolation: OpenClaw spawned with explicit env that excludes
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_SESSION_TOKEN,AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, andAWS_CONTAINER_CREDENTIALS_FULL_URI - Credential refresh: 45-minute interval timer re-assumes the role and updates credential files (STS self-assume max duration is 1 hour)
- Trust policy condition: STS self-assume trust requires sts:RoleSessionName matching "scoped-*" prefix, preventing unconditioned re-assumption
- Zero-access fallback: If
EXECUTION_ROLE_ARNis not set or STS fails, OpenClaw starts with zero AWS access (all credential env vars stripped). Tools will fail gracefully but no cross-user data access is possible - Proxy keeps full credentials: The proxy process is trusted code and retains full execution role credentials for Bedrock, Cognito, and S3 image access (with application-level namespace enforcement)
Always confirm which git branch you are on BEFORE making any code changes or deployments. If the user specifies a branch, switch to it first and verify with git branch --show-current. Never assume the current branch is correct.
Default deployment region is us-west-2. Hybrid deployment: CDK manages infrastructure (VPC, IAM, S3, Lambda, etc.), Starter Toolkit manages Runtime/Endpoint/ECR/Docker build. Use ./scripts/deploy.sh for the full 3-phase flow. After deploying, verify the old session/container is replaced — stale sessions can mask fixes.
Never push to any remote (GitHub, GitLab, or otherwise) without explicit user confirmation. Always ask before pushing.
When asked to create a plan, produce it concisely in ONE iteration. Do not endlessly revise or research unless asked. If the user says 'implement', move directly to code changes — do not re-plan. If a plan is approved, begin implementation immediately.
To add a new messaging channel (e.g., WhatsApp, Discord, LINE), follow the Feishu implementation as a reference:
- Add a new secret for the channel bot token/credentials
- Export the secret name for cross-stack reference
- Add credential fetching function (e.g.,
_get_feishu_credentials()) - Add webhook validation function (e.g.,
validate_feishu_webhook()) - Add message sending function (e.g.,
send_feishu_message()) - Add progress notification function (e.g.,
_feishu_progress_notify()) - Add main handler function (e.g.,
handle_feishu()) - Wire into the Lambda handler: sync path (webhook validation + async dispatch) and async path (message processing)
- Handle channel-specific features: event decryption (Feishu AES-256-CBC), signature verification, bot mention stripping (group chat), image download, etc.
- Add
POST /webhook/<channel>route - Pass the new secret name as Lambda environment variable
- Add response delivery function for the new channel (e.g.,
send_feishu_message())
- Interactive script: display webhook URL, prompt for credentials, store in Secrets Manager, add user to allowlist
- Webhook validation, event parsing, message sending, edge cases
- Webhook validation: Each channel has its own signature/token verification. Fail-closed (reject if validation fails)
- Async dispatch: Return 200 immediately to the webhook, self-invoke Lambda asynchronously for processing (prevents webhook timeouts)
- User ID format:
<channel>:<platform_user_id>(e.g.,feishu:ou_xxxx,telegram:123456) - Event encryption: Some platforms (Feishu) encrypt webhook events. Decrypt in the handler using platform-provided keys. Use system libcrypto (ctypes) for AES to avoid native dependency issues across Lambda architectures
- ECR repo naming: Starter Toolkit creates repos with
bedrock-agentcore-prefix (e.g.,bedrock-agentcore-openclaw_agent). CDK IAM policies must include this pattern — mismatch causes "initialization exceeded 120s" (misleading error, actually ECR permission denied) update-agent-runtimedoes NOT replace running containers: Env var changes only apply to NEW sessions. Alwaysagentcore stop-sessionafter updating runtime env vars- Starter Toolkit
--local-buildskips CodeBuild: Useful for pre-pushed images. Default mode always triggers CodeBuild which rebuilds and overwrites the image tag - Starter Toolkit VPC subnet changes: "Immutable" via
agentcore configure, but actually mutable via directaws bedrock-agentcore-control update-agent-runtimeAPI - CodeBuild Docker Hub rate limit: Dockerfile must use
public.ecr.aws/docker/library/node:22-sliminstead of Docker Hub
- Cross-region inference profiles work through VPC endpoints:
global.anthropic.claude-opus-4-6-v1works fine throughbedrock-runtimeVPC endpoint (despite initial suspicion otherwise) - Security group egress: TCP 443 only is sufficient — DNS uses VPC internal resolver (not affected by SG)
- Session ID is deterministic:
ses_{userId}_{hash}— same user always gets same session ID, sostop-sessionwith the correct ID is essential after config changes - Cold start timing: VPC mode ~30-60s for ENI creation + image pull. First message triggers init (proxy + OpenClaw startup)
This project uses git worktrees for parallel branch development:
# Current worktrees
git worktree list
# The deploy branch is checked out at ~/g-repo/openclaw-deploy (worktree)
# The main repo is at ~/g-repo/sample-host-openclaw-on-amazon-bedrock-agentcore
# When done with the deploy branch, merge to main and clean up:
cd ~/g-repo/sample-host-openclaw-on-amazon-bedrock-agentcore
git checkout main
git merge deploy/starter-toolkit-hybrid
git worktree remove ~/g-repo/openclaw-deploy # removes worktree directory
git branch -d deploy/starter-toolkit-hybrid # delete branch if fully merged
# Or keep the worktree for continued work — no cleanup neededThis is a Python/Node.js project (OpenClaw on AWS Bedrock AgentCore). Key components: Telegram bot, Slack Socket Mode, CDK infrastructure, Docker/ECR deployments, S3 workspace, per-user memory isolation. Subagents are OpenClaw-native running on the same AgentCore runtime — they are NOT separate Bedrock agents.