Feat/k8s sandbox#3171
Conversation
Release OptionsSuggested: Patch ( React with an emoji to override the release type:
Current version:
|
🧪 BenchmarkShould we run the Virtual MCP strategy benchmark for this PR? React with 👍 to run the benchmark.
Benchmark will run on the next push after you react. |
03254b1 to
37ffa22
Compare
acb4692 to
b87313a
Compare
New Kubernetes runner lives at mesh-plugin-user-sandbox/runner/k8s and sits behind its own subpath export so docker/freestyle deploys never pull in @kubernetes/client-node. Opt in with MESH_SANDBOX_RUNNER=kubernetes; docker stays the dev default. Image side: bumps Bun to 1.3.11 so the daemon can read modern bun.lock (configVersion: 1), and adds a per-boot BOOT_ID to /health so the runner can detect container restarts (OOMKill, eviction) and re-bootstrap the workdir instead of stranding a live pod with an empty /app.
- Bumped version of decocms to 2.274.0. - Updated various dependencies including @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai, and @ai-sdk/provider-utils to their latest versions. - Added new entries for @anthropic-ai/claude-agent-sdk and its platform-specific variants.
|
Heads up — PR #3178 lands a unified daemon codebase at The unified daemon already exposes Happy to pair on the rebase when #3178 is in — should be small for the k8s side since the daemon contract is the same across runners. |
VM_START used to block until clone+install finished (~30s on medium repos). The Terminal tab opens its SSE connection only after VM_START returns, so users saw the entire setup output dumped at once via `replayTo` instead of streaming live. Run repo bootstrap in the background (Docker + K8s runners) and persist the handle BEFORE bootstrap so /api/sandbox/<handle> resolves while clone is still running. Bootstrap output streams through the daemon's log ring under a `setup` source so the Terminal tab can subscribe via SSE. Daemon side: split bash output on any CR/LF run so git's progress reports surface as distinct log lines instead of accumulating until the trailing newline. Set CI=1 in dev-process env so Vite's interactive shortcut reader doesn't EOF and exit when stdio is ignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
85702d1 to
4c63497
Compare
Resolved conflicts to align with main's daemon-orchestrator model (PR #3175): the Bun.serve daemon now owns clone + install + dev-server boot, driven by env vars (DAEMON_TOKEN, DAEMON_BOOT_ID, CLONE_URL, BRANCH, GIT_USER_*, RUNTIME, PACKAGE_MANAGER) on container/pod start. Key resolutions: - Dropped sandbox-daemon.ts route (deleted on main; daemon access now via internal vm-tools that call SandboxRunner.proxyDaemonRequest). - Dropped image/*.mjs files (deleted on main; replaced by Bun.serve TS daemon at packages/sandbox/daemon/). - Refactored k8s runner to mirror docker's env contract: pass full env (CLONE_URL, BRANCH, RUNTIME, etc.) through SandboxClaim.spec.env so the daemon orchestrates setup itself. Removed bootstrapAndStart, bootstrapPromise, repoAttached fields, startDevServer/stopDevServer calls, and bootstrapRepo helper — all subsumed by daemon-side resume-on-restart and dev-autostart. - Reverted docker runner background-bootstrap fields/methods (main's approach via daemon orchestrator covers the same SSE-log streaming goal more thoroughly). - Moved packages/mesh-plugin-user-sandbox/server/runner/k8s/* → packages/sandbox/server/runner/k8s/* (main renamed the package). - Updated lifecycle.ts k8s import to @decocms/sandbox/runner/k8s. - Regenerated bun.lock via bun install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… smoke.ts, and up.sh scripts - Introduced .gitattributes to mark generated files for the Helm chart in agent-sandbox. - Updated README to clarify the build process for the daemon bundle and image. - Modified reload-image.sh and up.sh to build the daemon bundle before building the Docker image, ensuring the correct context is used. - Adjusted import paths in smoke.ts to reflect the new package structure.
- Added stdout logging in the Broadcaster class to mirror SSE subscriber output for better visibility in `kubectl logs` and k9s. - Refactored KubernetesSandboxRunner to streamline port-forwarding logic, removing the devForward property and ensuring that preview traffic is routed through the daemon port-forward. - Updated comments for clarity on the daemon's role in handling traffic and the implications of the new structure.
- Introduced a WebSocket proxy to handle upgrades for Vite's HMR and other dev-server WebSocket connections, ensuring seamless communication through the daemon. - Enhanced port discovery logic to dynamically identify listening ports of descendant processes, improving the accuracy of the dev server's operational context. - Refactored the upstream probing mechanism to utilize candidate ports, allowing for more robust detection of the active development server. - Updated the proxy handler to resolve the actual listening port dynamically, ensuring consistent routing of requests. - Added comprehensive comments and documentation to clarify the new functionality and its implications for the daemon's operation.
- Added support for nodeSelector and tolerations in the sandbox values, allowing for better pod scheduling and resource management. - Introduced hostUsers option to enable user namespace remapping, enhancing security by preventing container escapes to real node UIDs. - Implemented readOnlyRootFilesystem configuration to improve security and stability, with provisions for necessary volume mounts. - Updated agent-sandbox manifest to include PodSecurity admission labels, enforcing baseline security policies for the namespace. These changes aim to improve the security and configurability of sandbox deployments.
- Introduced a new environment variable, MESH_SANDBOX_PREVIEW_URL_PATTERN, to allow the specification of a public URL pattern for sandbox previews. - Updated Docker and Kubernetes sandbox runners to utilize the preview URL pattern, improving accessibility for users. - Modified Helm chart values to include the new preview URL pattern configuration, ensuring proper deployment in production environments. These changes aim to enhance the configurability and accessibility of sandbox previews in Kubernetes deployments.
- Introduced a sandbox preview reverse-proxy to route requests for `<handle>.preview.<base-domain>` to the corresponding sandbox daemon, improving accessibility for preview environments. - Enhanced WebSocket handling to support upgrades and message processing for preview connections, ensuring seamless communication for development workflows. - Updated the Helm chart to include configurations for the new preview URL pattern and related settings, facilitating deployment in Kubernetes environments. These changes aim to improve the functionality and usability of sandbox previews in the development process.
- Updated `values-kube-prometheus-stack.yaml` to use camelCase for `kubeStateMetrics` and added comments for clarity on subchart toggles. - Modified `values-otel-collector.yaml` to change `serviceMonitor` to `podMonitor`, reflecting the new scraping strategy, and added comments regarding omitted metadata. - Adjusted `sandbox-overview.json` to replace deprecated metrics expressions with updated ones for CPU and network utilization, ensuring accurate monitoring data. These changes enhance the clarity and functionality of the monitoring setup in the Kubernetes sandbox environment.
- Bumped versions for `decocms` to 2.281.2 and `@decocms/runtime` to 1.6.0 in `bun.lock`. - Added `@opentelemetry/api` as a new dependency in the sandbox package. - Introduced new monitoring configurations for Kubernetes, including updated values for Prometheus and OpenTelemetry collector, and added a new dashboard for sandbox overview. - Improved WebSocket handling in the sandbox to manage pending frames and prevent memory exhaustion. These changes aim to enhance the observability and performance of the sandbox environment while ensuring accurate monitoring and dependency management.
…ecture - Configured `nodeSelector` in `values.yaml` to specify `kubernetes.io/arch: amd64`, ensuring compatibility with amd64 node groups. - Added comments to guide users on overriding this setting for arm64 clusters, enhancing clarity for deployment configurations. These changes improve the deployment flexibility for different architecture environments.
…anagement. This cleanup helps streamline the project structure.
- Introduced a GitHub Actions workflow to build and push the Studio Sandbox Docker image upon changes to the `packages/sandbox` directory. - Updated environment variable references from `MESH_SANDBOX_PREVIEW_URL_PATTERN` to `STUDIO_SANDBOX_PREVIEW_URL_PATTERN` across multiple files to reflect the new naming convention. - Adjusted related test cases and configurations to ensure consistency with the new sandbox naming scheme. These changes enhance the deployment process and improve clarity in the codebase regarding the Studio Sandbox environment.
- Updated environment variable references and type definitions to replace `MESH_SANDBOX_RUNNER` and `kubernetes` with `STUDIO_SANDBOX_RUNNER` and `agent-sandbox` across multiple files. - Adjusted comments and documentation to reflect the new naming convention for the agent-sandbox runner. - Enhanced test cases and configurations to ensure consistency with the updated sandbox runner implementation. These changes improve clarity and maintainability in the codebase regarding the sandbox environment.
What is this contribution about?
Screenshots/Demonstration
How to Test
Migration Notes
Review Checklist
Summary by cubic
Adds an agent-sandbox runner (Kubernetes via kubernetes-sigs/agent-sandbox) and a preview subdomain reverse-proxy that routes
*.preview.<domain>to each sandbox daemon. Also ships Helm support, local kind scripts, monitoring, UI/SDK updates, improved daemon proxying (WS + dynamic port discovery), logging, and a release workflow. Runner kind/env are nowagent-sandboxandSTUDIO_SANDBOX_RUNNER.New Features
@decocms/sandbox/runner/agent-sandbox(opt in withSTUDIO_SANDBOX_RUNNER=agent-sandbox). Uses Bun TLSfetchwith kubeconfig, per-tenant pod labels (org/user), per-claimDAEMON_TOKEN, readiness watch, and a single port-forward to the daemon that also carries preview traffic; optional public previews viapreviewUrlPattern. Rehydrates on daemonbootIdand lazy-loads@kubernetes/client-node.<handle>.preview.<base-domain>to the matching sandbox daemon on port 9000 and upgrades WS; preview-admin paths are blocked. WS buffering caps pending frames.STUDIO_SANDBOX_PREVIEW_URL_PATTERNenables public preview URLs; wired through Helm (sandbox.agentSandbox.previewUrlPattern/previewGateway) andconfigMap.meshConfig."agent-sandbox"; mesh dynamically imports@decocms/sandbox/runner/agent-sandboxonly when selected.charts/agent-sandboxsubchart (vendors operator + CRDs), sharedSandboxTemplate, optionalSandboxWarmPool, mesh RBAC, and aNetworkPolicy. New knobs:nodeSelector(defaultamd64, override forarm64),tolerations,hostUsers,readOnlyRootFilesystem, plus optional wildcard Gateway + Certificate for previews. Operator namespace gets PodSecurity labels. Vendored files marked generated via.gitattributes.deploy/k8s-sandbox/local/*kind scripts (up.sh/down.sh/reload-image.sh) and an end-to-endsmoke.ts. Optional monitoring stack (kube-prometheus-stack + OTel collector) and a Grafana dashboard with updated CPU/network metrics.kubectl logs.mesh-sandboximage toghcr.io;@opentelemetry/apiand@kubernetes/client-nodeadded to@decocms/sandbox.Migration
sandbox.agentSandbox.enabled=truein Helm. The chart setsSTUDIO_SANDBOX_RUNNER=agent-sandboxif not provided. For public previews, setSTUDIO_SANDBOX_PREVIEW_URL_PATTERNand configuresandbox.agentSandbox.previewGateway.*. For local kind, usedeploy/k8s-sandbox/local/up.shandreload-image.sh; remove withdown.sh. Onarm64, override the defaultnodeSelector.Written for commit 34ada69. Summary will update on new commits. Review in cubic