Skip to content

ALB-based TLS termination: proxy data-plane through control plane#62

Merged
motatoes merged 1 commit intomainfrom
tls2
Mar 13, 2026
Merged

ALB-based TLS termination: proxy data-plane through control plane#62
motatoes merged 1 commit intomainfrom
tls2

Conversation

@breardon2011
Copy link
Contributor

Summary

  • Route all SDK traffic through a single ALB → control plane entry point
  • Control plane proxies data-plane requests (exec, files, PTY, agent) to workers over the internal VPC network
  • Workers are no longer publicly accessible — HTTP on private IP only, no Caddy/TLS
  • SDK requires zero changes (already falls back to apiUrl when connectURL is empty)

Architecture

┌──────────────┐      HTTPS       ┌──────────────┐
│     SDK      │ ───────────────→ │  ALB (ACM)   │
└──────────────┘                  └──────┬───────┘
                                         │ HTTP
                                         ▼
                               ┌──────────────────┐
                               │  Control Plane   │
                               │     (:8080)      │
                               │                  │
                               │  Lifecycle API   │  ← create, delete, list, hibernate, wake
                               │  SandboxAPIProxy │  ← exec, files, pty, agent, timeout
                               └────────┬─────────┘
                                  gRPC  │  HTTP
                              (lifecycle)│(data-plane)
                                        ▼
                               ┌──────────────────┐
                               │  Worker (VPC)    │
                               │  :8080 + :9090   │
                               │  private IP only │
                               │                  │
                               │  ┌──┐┌──┐┌──┐   │
                               │  │VM││VM││VM│...│
                               │  └──┘└──┘└──┘   │
                               └──────────────────┘

Auth flow: SDK authenticates with API key → control plane validates → proxy mints a short-lived JWT (5 min) → worker validates JWT.

Changes

New: internal/proxy/sandbox_api_proxy.go

Generic HTTP/WebSocket proxy that forwards data-plane requests to the worker that owns the sandbox. Handles:

  • Sandbox lookup (PG session → worker ID → Redis registry → worker HTTP address)
  • Wake-on-request for hibernated sandboxes
  • Recovery when a worker disappears (checkpoint restore on another worker)
  • JWT minting for worker auth
  • WebSocket hijack for PTY and exec streaming sessions
  • Path rewriting (/api/sandboxes/.../sandboxes/...)

Modified: internal/api/router.go

Data-plane routes (exec, files, pty, agent, timeout, token refresh) are now conditionally registered:

  • Server mode: routes go through SandboxAPIProxy → forwarded to workers
  • Combined/worker mode: routes use local handlers (unchanged)

Modified: internal/api/sandbox.go

  • connectURL removed from all server-mode API responses (createSandboxRemote, getSandboxRemote, listSandboxesRemote)
  • When connectURL is empty, the SDK automatically uses apiUrl (the control plane through ALB)
  • Removed server-mode guard on setTimeout (proxy handles routing now)

Modified: internal/worker/http_server.go

  • Removed /caddy/check endpoint (Caddy on-demand TLS validation no longer needed)

Modified: deploy/ec2/setup-instance.sh

  • Removed Caddy installation (Go, xcaddy, Route53 module, config, service)
  • Worker identity uses private IP for OPENSANDBOX_HTTP_ADDR (was public IP)

Deleted: deploy/ec2/Caddyfile, deploy/ec2/caddy.service

Modified: cmd/server/main.go

  • Creates SandboxAPIProxy when worker registry, DB store, and JWT issuer are available

What doesn't change

  • Terraform — ALB already exists targeting control plane, workers already in private subnets
  • SDK — already falls back to apiUrl when connectURL is empty
  • Worker HTTP server routes — still handle requests the same way, just receive them from the proxy instead of directly from clients
  • Redis heartbeats — just a config change (HTTP_ADDR = private IP)
  • gRPC — lifecycle operations (create, destroy, hibernate, wake) still use gRPC as before

🤖 Generated with Claude Code

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
opensandbox Ready Ready Preview, Comment Mar 12, 2026 7:47pm

Request Review

@breardon2011 breardon2011 marked this pull request as ready for review March 12, 2026 20:00
@motatoes motatoes merged commit ec5cf24 into main Mar 13, 2026
3 checks passed
@github-actions
Copy link

Preview Environment Destroyed

The preview environment dev-pr-62 has been torn down.
All AWS resources for this environment have been cleaned up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants