Commit d353357
authored
hypeman build (#53)
* poc: add build system for source-to-image builds
Proof of concept for a secure build system that runs rootless BuildKit
inside ephemeral Cloud Hypervisor microVMs for multi-tenant isolation.
Components:
- lib/builds/: Core build system (queue, storage, manager, cache)
- lib/builds/builder_agent/: Guest binary for running BuildKit
- lib/builds/templates/: Dockerfile generation for Node.js/Python
- lib/builds/images/: Builder image Dockerfiles
API endpoints:
- POST /v1/builds: Submit build job
- GET /v1/builds: List builds
- GET /v1/builds/{id}: Get build details
- DELETE /v1/builds/{id}: Cancel build
- GET /v1/builds/{id}/logs: Stream logs (SSE)
* fix: complete build system E2E functionality
- Start vsock handler when build manager starts for builder VM communication
- Create config volume with build.json mounted at /config in builder VMs
- Mount source volume as read-write for generated Dockerfile writes
- Fix builder image Dockerfile: copy buildkit-runc as /usr/bin/runc
- Mount cgroups (v2 with v1 fallback) in microVM init script for runc
- Configure insecure registry flag in builder agent for HTTP registry push
- Add auth bypass for internal VM network (10.102.x.x) registry pushes
- Update README with comprehensive E2E testing guide and troubleshooting
* docs: add builder images guide
Comprehensive documentation for creating, building, and testing
builder images including required components, OCI format build
process, and troubleshooting common issues.
* docs: add build system roadmap and security model
Includes planned phases for cache optimization, security hardening,
additional runtimes, and observability. Documents threat model and
open design questions.
* modify plans
* fix(builds): correct vsock communication pattern for Cloud Hypervisor
- Update builder_agent to LISTEN on vsock port 5001 instead of dialing out
- Update manager to connect TO builder VM's vsock socket with CH handshake
- Simplify vsock_handler to only contain message types
Cloud Hypervisor's vsock implementation requires host to dial guest,
not the other way around. This matches the pattern used by exec_agent.
Build results (digest, provenance, logs) now properly returned via vsock.
* fix(e2e): update e2e-build-test.sh for current config
- Fix API port to 8083 (was 8080)
- Update builder image to hirokernel/builder-nodejs20:latest
- Add explicit Dockerfile to test source
- Fix log functions to output to stderr (avoid mixing with return values)
- Add environment variables documentation
* chore(e2e): update test script for generic builder system
- Remove deprecated runtime parameter from build submission
- Require Dockerfile in source tarball (fail early if missing)
- Update builder image reference to hypeman/builder:latest
- Update comments to reflect generic builder approach
* feat(builds): implement generic builder with registry token auth
- Replace runtime-specific builders (nodejs20, python312) with generic builder
- Users now provide their own Dockerfile instead of auto-generation
- Add JWT-based registry token authentication for builder VMs
- Tokens scoped to specific build and cache repositories
- 30-minute expiry for security
- Support both Bearer and Basic auth (for BuildKit compatibility)
- Update builder agent to configure registry auth from token
- Fix auth middleware to handle Basic auth for registry paths
- Update API to make 'runtime' optional (deprecated)
- Add comprehensive documentation for building OCI-format images
- Delete deprecated: templates/, base/, nodejs20/, python312/ Dockerfiles
Breaking changes:
- Dockerfile is now required (in source tarball or as API parameter)
- Builder image must be built with 'docker buildx --output type=registry,oci-mediatypes=true'
* docs(builds): update README with registry token auth and generic builder
- Update architecture diagram to show JWT token auth instead of insecure
- Add Registry Token System section documenting registry_token.go
- Add Metrics section documenting metrics.go
- Update cache example to remove outdated runtime references
- Add Registry Authentication section explaining token-based auth
- Update Security Model to include registry auth
- Fix Build and Push section to use docker buildx with OCI format
- Update E2E test example to use generic builder image
- Update troubleshooting for 401 errors with token-based auth
- Update config.json example to show registry_token and dockerfile fields
- Remove references to deleted templates package
* chore: remove trailing newlines
* fix(images): support both Docker v2 and OCI v1 manifest formats
Use go-containerregistry instead of umoci's casext for manifest parsing,
which handles both Docker v2 and OCI v1 formats automatically.
Changes:
- extractOCIMetadata: Use layout.Path and Image.ConfigFile() from
go-containerregistry which abstracts away manifest format differences
- unpackLayers: Get manifest via go-containerregistry, then convert to
OCI v1 format for umoci's layer unpacker
- Add imageByAnnotation() helper to find images in OCI layout by tag
- Add convertToOCIManifest() to convert go-containerregistry manifest
to OCI v1.Manifest for umoci compatibility
- Update documentation to remove OCI format requirement
This allows users to build images with standard 'docker build' without
needing 'docker buildx --output type=registry,oci-mediatypes=true'.
* fix: ensure volume cleanup succeeds after build timeout
- Use context.Background() for volume deletion in defers instead of the
build timeout context, matching the pattern used for instance cleanup
- Fix error variable shadowing in setup config volume error path (copyErr)
- Improve multipart form field parsing with proper error handling using
io.ReadAll instead of bytes.Buffer with ignored errors
* cursor comment
* fix: security and reliability improvements for build system
- Reject registry tokens from API authentication (defense-in-depth)
- Check for repos, scope, build_id claims
- Reject tokens with builder- subject prefix
- Add comprehensive test coverage
- Fix indefinite blocking in waitForResult
- Use goroutine + select for context-aware cancellation
- Close connection to unblock decoder on timeout
- Prevents resource leaks from unresponsive builders
- Fix Makefile builder targets
- Replace non-existent nodejs20/python312 targets
- Single build-builder target for generic builder
- build-builders as backwards-compatible alias
* Remove deprecated runtime code and add security fixes
Build System Cleanup:
- Remove deprecated RuntimeNodeJS20/RuntimePython312 constants
- Remove Runtime field from Build, CreateBuildRequest, BuildConfig
- Remove ToolchainVersion from BuildProvenance
- Update OpenAPI spec: remove runtime field, rename /builds/{id}/logs to /builds/{id}/events
- Add typed BuildEvent schema for SSE streaming
- Remove unused deref function from builds.go
- Update documentation (PLAN.md, README.md) to reflect generic builder
Security Fixes:
- Fix IP spoofing vulnerability: isInternalVMRequest now only trusts r.RemoteAddr
- Add registry token rejection to OapiAuthenticationFunc for defense-in-depth
Testing:
- Add comprehensive build manager unit tests with mocked dependencies
- Enhance E2E test script to run VM with built image after successful build
- Add --skip-run flag to E2E test for build-only testing
- Fix test race conditions by testing storage/queue directly
Documentation:
- Add lib/builds/TODO.md tracking remaining issues
- Mark completed security fixes and improvements
* Fix E2E test to use /events endpoint instead of /logs
* Fix E2E test and add OCI media types to builder output
- Added oci-mediatypes=true to BuildKit output in builder agent
This ensures built images use OCI format which is required for
Hypeman's image conversion (umoci expects OCI, not Docker format)
- Improved E2E script image import status checking
- Use exact name matching instead of build ID filtering
- Better error messages when import fails with media type hint
- Export imported image name for use in instance creation
- The VM run test requires the updated builder image to be deployed
Use --skip-run flag until the builder image is published
* Fix E2E script output handling for image import
* feat(builds): implement SSE streaming for build events
- Add BuildEvent type with log, status, and heartbeat event types
- Add StreamBuildEvents method to Manager interface
- Implement status subscription system for real-time status updates
- Implement log streaming using tail -f for follow mode
- Add heartbeat events every 30 seconds in follow mode
- Update GetBuildEvents API handler with proper SSE response
- Add unit tests for StreamBuildEvents (5 test cases)
- Update TODO.md to mark SSE streaming as completed
* feat(builds): implement build secrets via vsock
- Add SecretIDs field to VsockMessage for secrets request
- Add SecretsVsockPort constant (5002) for future extensibility
- Update waitForResult to handle get_secrets requests from agent
- Implement host_ready message to trigger secrets exchange
- Builder agent requests secrets on host_ready, waits for response
- Write secrets to /run/secrets/{id} for BuildKit consumption
- Add FileSecretProvider for reading secrets from filesystem
- Path traversal protection in FileSecretProvider
- Unit tests for FileSecretProvider (8 test cases)
- Update TODO.md to mark build secrets as completed
* feat(config): add BUILD_SECRETS_DIR configuration
- Add BuildSecretsDir to Config struct
- Load from BUILD_SECRETS_DIR environment variable
- Update ProvideBuildManager to use FileSecretProvider when configured
- Log when build secrets are enabled
* fix(builds): fix vsock protocol deadlock and add secrets API support
- Fix protocol deadlock: agent now proactively sends build_result when complete
instead of waiting for host to request it (was causing builds to hang forever)
- Add secrets field to /builds POST API endpoint
- Add INFO logging for vsock communication debugging
- Regenerate oapi.go with secrets field support
The build agent would receive host_ready, handle secrets, and then loop waiting
for more messages. But it never sent anything back, and the host was also
waiting. Now the agent spawns a goroutine after host_ready to wait for build
completion and send the result automatically.
* docs: update TODO with vsock protocol fix details
* docs: document cgroup requirement for BuildKit secrets
The secrets API flow is fully implemented and working:
- Host receives secrets from API
- Host sends secrets to builder agent via vsock
- Agent writes secrets to /run/secrets/
- BuildKit receives --secret flags
However, BuildKit's runc requires cgroup mounts when --secret
flags are present, which the current microVM doesn't have.
This is an infrastructure issue to be fixed separately.
* docs: add detailed cgroup analysis for BuildKit secrets
Documents the root cause (missing /sys/fs/cgroup mount in VM init),
two proposed solutions (Option A: all VMs, Option B: builder-only),
and security analysis for team discussion.
* feat(builds): add guest-agent to builder VMs for exec debugging
- Update builder Dockerfile to build and include guest-agent binary
- Only copy proto files (not client.go) to avoid host-side dependencies
- Start guest-agent in builder-agent main() before build starts
- Guest-agent listens on vsock port 2222 for exec requests
Note: Testing blocked by cgroup issue (builds fail before we can exec).
Once cgroups are enabled, exec into builder VMs will work.
* fix(e2e): fix state comparison and image name matching in E2E test
- Convert state to lowercase for comparison (API returns 'Running' not 'running')
- Use build ID matching for imported images (API normalizes registry names)
E2E test now passes for full build + VM run flow.
* fix(registry): preserve registry host in image names when triggering conversion
Previously, when the builder pushed to 10.102.0.1:8083/builds/xxx, the registry
would extract only the path (/builds/xxx) and normalize it to docker.io/builds/xxx.
This was confusing because:
- docker.io implies Docker Hub, but these are local builds
- Could conflict with real Docker Hub images
- Lost the original registry URL
The fix includes the request's Host header when building the full repository path,
so images are now stored as 10.102.0.1:8083/builds/xxx as expected.
* docs: clean up TODO.md - remove completed tasks
Removed completed items:
- IP spoofing vulnerability fix
- Registry token scope leakage fix
- Vsock read deadline handling
- SSE streaming implementation
- Build secrets via vsock
- E2E test enhancement
- Build manager unit tests
- Guest agent on builder VMs
- Runtime/toolchain cleanup
Remaining tasks:
- Enable cgroups for BuildKit secrets (blocked on team discussion)
- Builder image tooling
- Keep failed builders for debugging
* docs: remove 'Keep Failed Builders' from TODO and delete PLAN.md
* fix(tests): update registry tests to use full host in image names
After the registry fix to preserve host in image names, tests need to
include the serverHost when looking up images.
Also fixes TestRegistryPushAndConvert timeout issue.
Note: Some tests may still fail due to Docker Hub rate limiting.
* feat(init): add cgroup2 mount for BuildKit/runc support
Mount cgroup2 filesystem at /sys/fs/cgroup during VM init and bind-mount
it to the new root. This enables runc (used by BuildKit) to work properly
for builds with secrets.
Security notes:
- cgroup v2 has no release_agent escape vector (unlike v1)
- VMs are already isolated by Cloud Hypervisor (hardware boundary)
- This is non-fatal if the kernel doesn't support cgroup2
To activate, rebuild initrd: make init && make initrd
* docs: update TODO.md with verified cgroup2 implementation
* chore: clean up TODO after cgroup2 implementation verified
* fix(builds): restore missing BuildEvent type definition
The BuildEvent type and event type constants were accidentally removed
from types.go, causing build failures in lib/providers.
* test: add SKIP_DOCKER_HUB_TESTS env var to skip rate-limited tests
When SKIP_DOCKER_HUB_TESTS=1 is set, tests that require pulling images
from Docker Hub are skipped. This allows CI to run without being blocked
by Docker Hub rate limiting.
Affected tests:
- lib/images: TestCreateImage*, TestListImages, TestDeleteImage, TestLayerCaching
- lib/instances: TestBasicEndToEnd, TestStandbyAndRestore, TestExecConcurrent,
TestCreateInstanceWithNetwork, TestQEMU*, TestVolume*, TestOverlayDisk*,
TestAggregateLimits_EnforcedAtRuntime
- lib/system: TestEnsureSystemFiles
- cmd/api/api: TestCreateImage*, TestCreateInstance*, TestInstanceLifecycle,
TestRegistry*, TestCp*, TestExec*
- integration: TestSystemdMode
* Revert "test: add SKIP_DOCKER_HUB_TESTS env var to skip rate-limited tests"
This reverts commit f09e53d.
* fix: address cursor bot review comments from PR #53
- Fix closure capturing loop variable in recoverPendingBuilds (add meta := meta shadow)
- Fix premature loop exit in StreamBuildEvents when receiving non-terminal status events
- Fix race condition where cancelled builds could be overwritten to 'failed' status
- Fix nil panic when secretProvider is not configured (use NoOpSecretProvider fallback)
* fix: make TestCreateImage_Idempotent resilient to timing variations
The second CreateImage call can return either 'pending' (still processing)
or 'ready' (already completed) depending on CI speed and caching.
The key idempotency invariant is that the digest is the same, not the status.
* moar
* moar1 parent 58df3eb commit d353357
File tree
39 files changed
+8708
-346
lines changed- cmd/api
- api
- config
- lib
- builds
- builder_agent
- images
- generic
- images
- middleware
- oapi
- paths
- providers
- registry
- system/init
- scripts
39 files changed
+8708
-346
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
206 | 206 | | |
207 | 207 | | |
208 | 208 | | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
209 | 220 | | |
210 | 221 | | |
211 | 222 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| 25 | + | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
| |||
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| 40 | + | |
38 | 41 | | |
39 | 42 | | |
40 | 43 | | |
| |||
45 | 48 | | |
46 | 49 | | |
47 | 50 | | |
| 51 | + | |
48 | 52 | | |
49 | 53 | | |
50 | 54 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
228 | | - | |
229 | | - | |
230 | | - | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
231 | 239 | | |
232 | 240 | | |
233 | 241 | | |
| |||
0 commit comments