Feat/k8s sandbox by pedrofrxncx · Pull Request #3171 · decocms/studio

pedrofrxncx · 2026-04-24T06:11:00Z

What is this contribution about?

Describe your changes and why they're needed.

Screenshots/Demonstration

Add screenshots or a Loom video if your changes affect the UI.

How to Test

Provide step-by-step instructions for reviewers to test your changes:

Step one

Step two

Expected outcome

Migration Notes

If this PR requires database migrations, configuration changes, or other setup steps, document them here. Remove this section if not applicable.

Review Checklist

PR title is clear and descriptive
Changes are tested and working
Documentation is updated (if needed)
No breaking changes

Summary by cubic

Adds an agent-sandbox runner (Kubernetes via kubernetes-sigs/agent-sandbox) and a preview subdomain reverse-proxy that routes *.preview.<domain> to each sandbox daemon. Also ships Helm support, local kind scripts, monitoring, UI/SDK updates, improved daemon proxying (WS + dynamic port discovery), logging, and a release workflow. Runner kind/env are now agent-sandbox and STUDIO_SANDBOX_RUNNER.

New Features
- Agent-sandbox runner: @decocms/sandbox/runner/agent-sandbox (opt in with STUDIO_SANDBOX_RUNNER=agent-sandbox). Uses Bun TLS fetch with kubeconfig, per-tenant pod labels (org/user), per-claim DAEMON_TOKEN, readiness watch, and a single port-forward to the daemon that also carries preview traffic; optional public previews via previewUrlPattern. Rehydrates on daemon bootId and lazy-loads @kubernetes/client-node.
- Preview networking: reverse-proxy routes <handle>.preview.<base-domain> to the matching sandbox daemon on port 9000 and upgrades WS; preview-admin paths are blocked. WS buffering caps pending frames.
- Config: STUDIO_SANDBOX_PREVIEW_URL_PATTERN enables public preview URLs; wired through Helm (sandbox.agentSandbox.previewUrlPattern/previewGateway) and configMap.meshConfig.
- Mesh/SDK/UI: widened unions to include "agent-sandbox"; mesh dynamically imports @decocms/sandbox/runner/agent-sandbox only when selected.
- Helm: adds charts/agent-sandbox subchart (vendors operator + CRDs), shared SandboxTemplate, optional SandboxWarmPool, mesh RBAC, and a NetworkPolicy. New knobs: nodeSelector (default amd64, override for arm64), tolerations, hostUsers, readOnlyRootFilesystem, plus optional wildcard Gateway + Certificate for previews. Operator namespace gets PodSecurity labels. Vendored files marked generated via .gitattributes.
- Local dev: deploy/k8s-sandbox/local/* kind scripts (up.sh/down.sh/reload-image.sh) and an end-to-end smoke.ts. Optional monitoring stack (kube-prometheus-stack + OTel collector) and a Grafana dashboard with updated CPU/network metrics.
- Daemon: WebSocket reverse-proxy for HMR, dynamic descendant port discovery, smarter probe scoring, and SSE logs teed to stdout for kubectl logs.
- CI/Deps: new workflow builds/pushes the mesh-sandbox image to ghcr.io; @opentelemetry/api and @kubernetes/client-node added to @decocms/sandbox.
Migration
- Default remains Docker; no changes required.
- To try agent-sandbox: set sandbox.agentSandbox.enabled=true in Helm. The chart sets STUDIO_SANDBOX_RUNNER=agent-sandbox if not provided. For public previews, set STUDIO_SANDBOX_PREVIEW_URL_PATTERN and configure sandbox.agentSandbox.previewGateway.*. For local kind, use deploy/k8s-sandbox/local/up.sh and reload-image.sh; remove with down.sh. On arm64, override the default nodeSelector.

^{Written for commit 34ada69. Summary will update on new commits. Review in cubic}

github-actions · 2026-04-24T06:11:10Z

Release Options

Suggested: Patch (2.283.3) — default (no conventional commit prefix detected)

React with an emoji to override the release type:

Reaction	Type	Next Version
👍	Prerelease	`2.283.3-alpha.1`
🎉	Patch	`2.283.3`
❤️	Minor	`2.284.0`
🚀	Major	`3.0.0`

Current version: 2.283.2

Note: If multiple reactions exist, the smallest bump wins. If no reactions, the suggested bump is used (default: patch).

github-actions · 2026-04-24T06:11:10Z

🧪 Benchmark

Should we run the Virtual MCP strategy benchmark for this PR?

React with 👍 to run the benchmark.

Reaction	Action
👍	Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

New Kubernetes runner lives at mesh-plugin-user-sandbox/runner/k8s and sits behind its own subpath export so docker/freestyle deploys never pull in @kubernetes/client-node. Opt in with MESH_SANDBOX_RUNNER=kubernetes; docker stays the dev default. Image side: bumps Bun to 1.3.11 so the daemon can read modern bun.lock (configVersion: 1), and adds a per-boot BOOT_ID to /health so the runner can detect container restarts (OOMKill, eviction) and re-bootstrap the workdir instead of stranding a live pod with an empty /app.

- Bumped version of decocms to 2.274.0. - Updated various dependencies including @ai-sdk/anthropic, @ai-sdk/gateway, @ai-sdk/google, @ai-sdk/openai, and @ai-sdk/provider-utils to their latest versions. - Added new entries for @anthropic-ai/claude-agent-sdk and its platform-specific variants.

tlgimenes · 2026-04-24T19:36:00Z

Heads up — PR #3178 lands a unified daemon codebase at packages/sandbox/daemon/ shared by freestyle and docker, shipped as a single daemon/dist/daemon.js bundle. Once it merges, your K8s runner can COPY (or mount) the same bundle instead of duplicating image/daemon.mjs.

The unified daemon already exposes /health with bootId for restart detection, so your runner can drop that bit and consume it from the shared surface. Paths are /_decopilot_vm/* with base64-wrapped bodies everywhere.

Happy to pair on the rebase when #3178 is in — should be small for the k8s side since the daemon contract is the same across runners.

VM_START used to block until clone+install finished (~30s on medium repos). The Terminal tab opens its SSE connection only after VM_START returns, so users saw the entire setup output dumped at once via `replayTo` instead of streaming live. Run repo bootstrap in the background (Docker + K8s runners) and persist the handle BEFORE bootstrap so /api/sandbox/<handle> resolves while clone is still running. Bootstrap output streams through the daemon's log ring under a `setup` source so the Terminal tab can subscribe via SSE. Daemon side: split bash output on any CR/LF run so git's progress reports surface as distinct log lines instead of accumulating until the trailing newline. Set CI=1 in dev-process env so Vite's interactive shortcut reader doesn't EOF and exit when stdio is ignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolved conflicts to align with main's daemon-orchestrator model (PR #3175): the Bun.serve daemon now owns clone + install + dev-server boot, driven by env vars (DAEMON_TOKEN, DAEMON_BOOT_ID, CLONE_URL, BRANCH, GIT_USER_*, RUNTIME, PACKAGE_MANAGER) on container/pod start. Key resolutions: - Dropped sandbox-daemon.ts route (deleted on main; daemon access now via internal vm-tools that call SandboxRunner.proxyDaemonRequest). - Dropped image/*.mjs files (deleted on main; replaced by Bun.serve TS daemon at packages/sandbox/daemon/). - Refactored k8s runner to mirror docker's env contract: pass full env (CLONE_URL, BRANCH, RUNTIME, etc.) through SandboxClaim.spec.env so the daemon orchestrates setup itself. Removed bootstrapAndStart, bootstrapPromise, repoAttached fields, startDevServer/stopDevServer calls, and bootstrapRepo helper — all subsumed by daemon-side resume-on-restart and dev-autostart. - Reverted docker runner background-bootstrap fields/methods (main's approach via daemon orchestrator covers the same SSE-log streaming goal more thoroughly). - Moved packages/mesh-plugin-user-sandbox/server/runner/k8s/* → packages/sandbox/server/runner/k8s/* (main renamed the package). - Updated lifecycle.ts k8s import to @decocms/sandbox/runner/k8s. - Regenerated bun.lock via bun install. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… smoke.ts, and up.sh scripts - Introduced .gitattributes to mark generated files for the Helm chart in agent-sandbox. - Updated README to clarify the build process for the daemon bundle and image. - Modified reload-image.sh and up.sh to build the daemon bundle before building the Docker image, ensuring the correct context is used. - Adjusted import paths in smoke.ts to reflect the new package structure.

- Added stdout logging in the Broadcaster class to mirror SSE subscriber output for better visibility in `kubectl logs` and k9s. - Refactored KubernetesSandboxRunner to streamline port-forwarding logic, removing the devForward property and ensuring that preview traffic is routed through the daemon port-forward. - Updated comments for clarity on the daemon's role in handling traffic and the implications of the new structure.

- Introduced a WebSocket proxy to handle upgrades for Vite's HMR and other dev-server WebSocket connections, ensuring seamless communication through the daemon. - Enhanced port discovery logic to dynamically identify listening ports of descendant processes, improving the accuracy of the dev server's operational context. - Refactored the upstream probing mechanism to utilize candidate ports, allowing for more robust detection of the active development server. - Updated the proxy handler to resolve the actual listening port dynamically, ensuring consistent routing of requests. - Added comprehensive comments and documentation to clarify the new functionality and its implications for the daemon's operation.

- Added support for nodeSelector and tolerations in the sandbox values, allowing for better pod scheduling and resource management. - Introduced hostUsers option to enable user namespace remapping, enhancing security by preventing container escapes to real node UIDs. - Implemented readOnlyRootFilesystem configuration to improve security and stability, with provisions for necessary volume mounts. - Updated agent-sandbox manifest to include PodSecurity admission labels, enforcing baseline security policies for the namespace. These changes aim to improve the security and configurability of sandbox deployments.

…s-sandbox

- Introduced a new environment variable, MESH_SANDBOX_PREVIEW_URL_PATTERN, to allow the specification of a public URL pattern for sandbox previews. - Updated Docker and Kubernetes sandbox runners to utilize the preview URL pattern, improving accessibility for users. - Modified Helm chart values to include the new preview URL pattern configuration, ensuring proper deployment in production environments. These changes aim to enhance the configurability and accessibility of sandbox previews in Kubernetes deployments.

- Introduced a sandbox preview reverse-proxy to route requests for `<handle>.preview.<base-domain>` to the corresponding sandbox daemon, improving accessibility for preview environments. - Enhanced WebSocket handling to support upgrades and message processing for preview connections, ensuring seamless communication for development workflows. - Updated the Helm chart to include configurations for the new preview URL pattern and related settings, facilitating deployment in Kubernetes environments. These changes aim to improve the functionality and usability of sandbox previews in the development process.

- Updated `values-kube-prometheus-stack.yaml` to use camelCase for `kubeStateMetrics` and added comments for clarity on subchart toggles. - Modified `values-otel-collector.yaml` to change `serviceMonitor` to `podMonitor`, reflecting the new scraping strategy, and added comments regarding omitted metadata. - Adjusted `sandbox-overview.json` to replace deprecated metrics expressions with updated ones for CPU and network utilization, ensuring accurate monitoring data. These changes enhance the clarity and functionality of the monitoring setup in the Kubernetes sandbox environment.

- Bumped versions for `decocms` to 2.281.2 and `@decocms/runtime` to 1.6.0 in `bun.lock`. - Added `@opentelemetry/api` as a new dependency in the sandbox package. - Introduced new monitoring configurations for Kubernetes, including updated values for Prometheus and OpenTelemetry collector, and added a new dashboard for sandbox overview. - Improved WebSocket handling in the sandbox to manage pending frames and prevent memory exhaustion. These changes aim to enhance the observability and performance of the sandbox environment while ensuring accurate monitoring and dependency management.

…s-sandbox

…ecture - Configured `nodeSelector` in `values.yaml` to specify `kubernetes.io/arch: amd64`, ensuring compatibility with amd64 node groups. - Added comments to guide users on overriding this setting for arm64 clusters, enhancing clarity for deployment configurations. These changes improve the deployment flexibility for different architecture environments.

…anagement. This cleanup helps streamline the project structure.

- Introduced a GitHub Actions workflow to build and push the Studio Sandbox Docker image upon changes to the `packages/sandbox` directory. - Updated environment variable references from `MESH_SANDBOX_PREVIEW_URL_PATTERN` to `STUDIO_SANDBOX_PREVIEW_URL_PATTERN` across multiple files to reflect the new naming convention. - Adjusted related test cases and configurations to ensure consistency with the new sandbox naming scheme. These changes enhance the deployment process and improve clarity in the codebase regarding the Studio Sandbox environment.

- Updated environment variable references and type definitions to replace `MESH_SANDBOX_RUNNER` and `kubernetes` with `STUDIO_SANDBOX_RUNNER` and `agent-sandbox` across multiple files. - Adjusted comments and documentation to reflect the new naming convention for the agent-sandbox runner. - Enhanced test cases and configurations to ensure consistency with the updated sandbox runner implementation. These changes improve clarity and maintainability in the codebase regarding the sandbox environment.

pedrofrxncx force-pushed the feat/k8s-sandbox branch 2 times, most recently from 03254b1 to 37ffa22 Compare April 24, 2026 06:17

pedrofrxncx mentioned this pull request Apr 24, 2026

refactor(sandbox): move freestyle runner and docker helpers into mesh-plugin-user-sandbox #3172

Merged

4 tasks

pedrofrxncx force-pushed the feat/k8s-sandbox branch 2 times, most recently from acb4692 to b87313a Compare April 24, 2026 15:48

pedrofrxncx added 2 commits April 24, 2026 14:32

pedrofrxncx force-pushed the feat/k8s-sandbox branch from 85702d1 to 4c63497 Compare April 25, 2026 01:29

pedrofrxncx and others added 16 commits April 27, 2026 21:46

rm

c49a352

Merge branch 'main' of https://github.com/decocms/studio into feat/k8…

f7c2bd7

…s-sandbox

Merge branch 'main' of https://github.com/decocms/studio into feat/k8…

738608a

…s-sandbox

Remove scheduled_tasks.lock file as it is no longer needed for task m…

e38bee6

…anagement. This cleanup helps streamline the project structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/k8s sandbox#3171

Feat/k8s sandbox#3171
pedrofrxncx wants to merge 19 commits intomainfrom
feat/k8s-sandbox

pedrofrxncx commented Apr 24, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

tlgimenes commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pedrofrxncx commented Apr 24, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this contribution about?

Screenshots/Demonstration

How to Test

Migration Notes

Review Checklist

Summary by cubic

Uh oh!

github-actions Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Options

Uh oh!

github-actions Bot commented Apr 24, 2026

🧪 Benchmark

Uh oh!

tlgimenes commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pedrofrxncx commented Apr 24, 2026 •

edited by cubic-dev-ai Bot

Loading

github-actions Bot commented Apr 24, 2026 •

edited

Loading