Skip to content

Commit d353357

Browse files
authored
hypeman build (#53)
* poc: add build system for source-to-image builds Proof of concept for a secure build system that runs rootless BuildKit inside ephemeral Cloud Hypervisor microVMs for multi-tenant isolation. Components: - lib/builds/: Core build system (queue, storage, manager, cache) - lib/builds/builder_agent/: Guest binary for running BuildKit - lib/builds/templates/: Dockerfile generation for Node.js/Python - lib/builds/images/: Builder image Dockerfiles API endpoints: - POST /v1/builds: Submit build job - GET /v1/builds: List builds - GET /v1/builds/{id}: Get build details - DELETE /v1/builds/{id}: Cancel build - GET /v1/builds/{id}/logs: Stream logs (SSE) * fix: complete build system E2E functionality - Start vsock handler when build manager starts for builder VM communication - Create config volume with build.json mounted at /config in builder VMs - Mount source volume as read-write for generated Dockerfile writes - Fix builder image Dockerfile: copy buildkit-runc as /usr/bin/runc - Mount cgroups (v2 with v1 fallback) in microVM init script for runc - Configure insecure registry flag in builder agent for HTTP registry push - Add auth bypass for internal VM network (10.102.x.x) registry pushes - Update README with comprehensive E2E testing guide and troubleshooting * docs: add builder images guide Comprehensive documentation for creating, building, and testing builder images including required components, OCI format build process, and troubleshooting common issues. * docs: add build system roadmap and security model Includes planned phases for cache optimization, security hardening, additional runtimes, and observability. Documents threat model and open design questions. * modify plans * fix(builds): correct vsock communication pattern for Cloud Hypervisor - Update builder_agent to LISTEN on vsock port 5001 instead of dialing out - Update manager to connect TO builder VM's vsock socket with CH handshake - Simplify vsock_handler to only contain message types Cloud Hypervisor's vsock implementation requires host to dial guest, not the other way around. This matches the pattern used by exec_agent. Build results (digest, provenance, logs) now properly returned via vsock. * fix(e2e): update e2e-build-test.sh for current config - Fix API port to 8083 (was 8080) - Update builder image to hirokernel/builder-nodejs20:latest - Add explicit Dockerfile to test source - Fix log functions to output to stderr (avoid mixing with return values) - Add environment variables documentation * chore(e2e): update test script for generic builder system - Remove deprecated runtime parameter from build submission - Require Dockerfile in source tarball (fail early if missing) - Update builder image reference to hypeman/builder:latest - Update comments to reflect generic builder approach * feat(builds): implement generic builder with registry token auth - Replace runtime-specific builders (nodejs20, python312) with generic builder - Users now provide their own Dockerfile instead of auto-generation - Add JWT-based registry token authentication for builder VMs - Tokens scoped to specific build and cache repositories - 30-minute expiry for security - Support both Bearer and Basic auth (for BuildKit compatibility) - Update builder agent to configure registry auth from token - Fix auth middleware to handle Basic auth for registry paths - Update API to make 'runtime' optional (deprecated) - Add comprehensive documentation for building OCI-format images - Delete deprecated: templates/, base/, nodejs20/, python312/ Dockerfiles Breaking changes: - Dockerfile is now required (in source tarball or as API parameter) - Builder image must be built with 'docker buildx --output type=registry,oci-mediatypes=true' * docs(builds): update README with registry token auth and generic builder - Update architecture diagram to show JWT token auth instead of insecure - Add Registry Token System section documenting registry_token.go - Add Metrics section documenting metrics.go - Update cache example to remove outdated runtime references - Add Registry Authentication section explaining token-based auth - Update Security Model to include registry auth - Fix Build and Push section to use docker buildx with OCI format - Update E2E test example to use generic builder image - Update troubleshooting for 401 errors with token-based auth - Update config.json example to show registry_token and dockerfile fields - Remove references to deleted templates package * chore: remove trailing newlines * fix(images): support both Docker v2 and OCI v1 manifest formats Use go-containerregistry instead of umoci's casext for manifest parsing, which handles both Docker v2 and OCI v1 formats automatically. Changes: - extractOCIMetadata: Use layout.Path and Image.ConfigFile() from go-containerregistry which abstracts away manifest format differences - unpackLayers: Get manifest via go-containerregistry, then convert to OCI v1 format for umoci's layer unpacker - Add imageByAnnotation() helper to find images in OCI layout by tag - Add convertToOCIManifest() to convert go-containerregistry manifest to OCI v1.Manifest for umoci compatibility - Update documentation to remove OCI format requirement This allows users to build images with standard 'docker build' without needing 'docker buildx --output type=registry,oci-mediatypes=true'. * fix: ensure volume cleanup succeeds after build timeout - Use context.Background() for volume deletion in defers instead of the build timeout context, matching the pattern used for instance cleanup - Fix error variable shadowing in setup config volume error path (copyErr) - Improve multipart form field parsing with proper error handling using io.ReadAll instead of bytes.Buffer with ignored errors * cursor comment * fix: security and reliability improvements for build system - Reject registry tokens from API authentication (defense-in-depth) - Check for repos, scope, build_id claims - Reject tokens with builder- subject prefix - Add comprehensive test coverage - Fix indefinite blocking in waitForResult - Use goroutine + select for context-aware cancellation - Close connection to unblock decoder on timeout - Prevents resource leaks from unresponsive builders - Fix Makefile builder targets - Replace non-existent nodejs20/python312 targets - Single build-builder target for generic builder - build-builders as backwards-compatible alias * Remove deprecated runtime code and add security fixes Build System Cleanup: - Remove deprecated RuntimeNodeJS20/RuntimePython312 constants - Remove Runtime field from Build, CreateBuildRequest, BuildConfig - Remove ToolchainVersion from BuildProvenance - Update OpenAPI spec: remove runtime field, rename /builds/{id}/logs to /builds/{id}/events - Add typed BuildEvent schema for SSE streaming - Remove unused deref function from builds.go - Update documentation (PLAN.md, README.md) to reflect generic builder Security Fixes: - Fix IP spoofing vulnerability: isInternalVMRequest now only trusts r.RemoteAddr - Add registry token rejection to OapiAuthenticationFunc for defense-in-depth Testing: - Add comprehensive build manager unit tests with mocked dependencies - Enhance E2E test script to run VM with built image after successful build - Add --skip-run flag to E2E test for build-only testing - Fix test race conditions by testing storage/queue directly Documentation: - Add lib/builds/TODO.md tracking remaining issues - Mark completed security fixes and improvements * Fix E2E test to use /events endpoint instead of /logs * Fix E2E test and add OCI media types to builder output - Added oci-mediatypes=true to BuildKit output in builder agent This ensures built images use OCI format which is required for Hypeman's image conversion (umoci expects OCI, not Docker format) - Improved E2E script image import status checking - Use exact name matching instead of build ID filtering - Better error messages when import fails with media type hint - Export imported image name for use in instance creation - The VM run test requires the updated builder image to be deployed Use --skip-run flag until the builder image is published * Fix E2E script output handling for image import * feat(builds): implement SSE streaming for build events - Add BuildEvent type with log, status, and heartbeat event types - Add StreamBuildEvents method to Manager interface - Implement status subscription system for real-time status updates - Implement log streaming using tail -f for follow mode - Add heartbeat events every 30 seconds in follow mode - Update GetBuildEvents API handler with proper SSE response - Add unit tests for StreamBuildEvents (5 test cases) - Update TODO.md to mark SSE streaming as completed * feat(builds): implement build secrets via vsock - Add SecretIDs field to VsockMessage for secrets request - Add SecretsVsockPort constant (5002) for future extensibility - Update waitForResult to handle get_secrets requests from agent - Implement host_ready message to trigger secrets exchange - Builder agent requests secrets on host_ready, waits for response - Write secrets to /run/secrets/{id} for BuildKit consumption - Add FileSecretProvider for reading secrets from filesystem - Path traversal protection in FileSecretProvider - Unit tests for FileSecretProvider (8 test cases) - Update TODO.md to mark build secrets as completed * feat(config): add BUILD_SECRETS_DIR configuration - Add BuildSecretsDir to Config struct - Load from BUILD_SECRETS_DIR environment variable - Update ProvideBuildManager to use FileSecretProvider when configured - Log when build secrets are enabled * fix(builds): fix vsock protocol deadlock and add secrets API support - Fix protocol deadlock: agent now proactively sends build_result when complete instead of waiting for host to request it (was causing builds to hang forever) - Add secrets field to /builds POST API endpoint - Add INFO logging for vsock communication debugging - Regenerate oapi.go with secrets field support The build agent would receive host_ready, handle secrets, and then loop waiting for more messages. But it never sent anything back, and the host was also waiting. Now the agent spawns a goroutine after host_ready to wait for build completion and send the result automatically. * docs: update TODO with vsock protocol fix details * docs: document cgroup requirement for BuildKit secrets The secrets API flow is fully implemented and working: - Host receives secrets from API - Host sends secrets to builder agent via vsock - Agent writes secrets to /run/secrets/ - BuildKit receives --secret flags However, BuildKit's runc requires cgroup mounts when --secret flags are present, which the current microVM doesn't have. This is an infrastructure issue to be fixed separately. * docs: add detailed cgroup analysis for BuildKit secrets Documents the root cause (missing /sys/fs/cgroup mount in VM init), two proposed solutions (Option A: all VMs, Option B: builder-only), and security analysis for team discussion. * feat(builds): add guest-agent to builder VMs for exec debugging - Update builder Dockerfile to build and include guest-agent binary - Only copy proto files (not client.go) to avoid host-side dependencies - Start guest-agent in builder-agent main() before build starts - Guest-agent listens on vsock port 2222 for exec requests Note: Testing blocked by cgroup issue (builds fail before we can exec). Once cgroups are enabled, exec into builder VMs will work. * fix(e2e): fix state comparison and image name matching in E2E test - Convert state to lowercase for comparison (API returns 'Running' not 'running') - Use build ID matching for imported images (API normalizes registry names) E2E test now passes for full build + VM run flow. * fix(registry): preserve registry host in image names when triggering conversion Previously, when the builder pushed to 10.102.0.1:8083/builds/xxx, the registry would extract only the path (/builds/xxx) and normalize it to docker.io/builds/xxx. This was confusing because: - docker.io implies Docker Hub, but these are local builds - Could conflict with real Docker Hub images - Lost the original registry URL The fix includes the request's Host header when building the full repository path, so images are now stored as 10.102.0.1:8083/builds/xxx as expected. * docs: clean up TODO.md - remove completed tasks Removed completed items: - IP spoofing vulnerability fix - Registry token scope leakage fix - Vsock read deadline handling - SSE streaming implementation - Build secrets via vsock - E2E test enhancement - Build manager unit tests - Guest agent on builder VMs - Runtime/toolchain cleanup Remaining tasks: - Enable cgroups for BuildKit secrets (blocked on team discussion) - Builder image tooling - Keep failed builders for debugging * docs: remove 'Keep Failed Builders' from TODO and delete PLAN.md * fix(tests): update registry tests to use full host in image names After the registry fix to preserve host in image names, tests need to include the serverHost when looking up images. Also fixes TestRegistryPushAndConvert timeout issue. Note: Some tests may still fail due to Docker Hub rate limiting. * feat(init): add cgroup2 mount for BuildKit/runc support Mount cgroup2 filesystem at /sys/fs/cgroup during VM init and bind-mount it to the new root. This enables runc (used by BuildKit) to work properly for builds with secrets. Security notes: - cgroup v2 has no release_agent escape vector (unlike v1) - VMs are already isolated by Cloud Hypervisor (hardware boundary) - This is non-fatal if the kernel doesn't support cgroup2 To activate, rebuild initrd: make init && make initrd * docs: update TODO.md with verified cgroup2 implementation * chore: clean up TODO after cgroup2 implementation verified * fix(builds): restore missing BuildEvent type definition The BuildEvent type and event type constants were accidentally removed from types.go, causing build failures in lib/providers. * test: add SKIP_DOCKER_HUB_TESTS env var to skip rate-limited tests When SKIP_DOCKER_HUB_TESTS=1 is set, tests that require pulling images from Docker Hub are skipped. This allows CI to run without being blocked by Docker Hub rate limiting. Affected tests: - lib/images: TestCreateImage*, TestListImages, TestDeleteImage, TestLayerCaching - lib/instances: TestBasicEndToEnd, TestStandbyAndRestore, TestExecConcurrent, TestCreateInstanceWithNetwork, TestQEMU*, TestVolume*, TestOverlayDisk*, TestAggregateLimits_EnforcedAtRuntime - lib/system: TestEnsureSystemFiles - cmd/api/api: TestCreateImage*, TestCreateInstance*, TestInstanceLifecycle, TestRegistry*, TestCp*, TestExec* - integration: TestSystemdMode * Revert "test: add SKIP_DOCKER_HUB_TESTS env var to skip rate-limited tests" This reverts commit f09e53d. * fix: address cursor bot review comments from PR #53 - Fix closure capturing loop variable in recoverPendingBuilds (add meta := meta shadow) - Fix premature loop exit in StreamBuildEvents when receiving non-terminal status events - Fix race condition where cancelled builds could be overwritten to 'failed' status - Fix nil panic when secretProvider is not configured (use NoOpSecretProvider fallback) * fix: make TestCreateImage_Idempotent resilient to timing variations The second CreateImage call can return either 'pending' (still processing) or 'ready' (already completed) depending on CI speed and caching. The key idempotency invariant is that the digest is the same, not the status. * moar * moar
1 parent 58df3eb commit d353357

39 files changed

+8708
-346
lines changed

Makefile

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,17 @@ test: ensure-ch-binaries ensure-caddy-binaries build-embedded
206206
gen-jwt: $(GODOTENV)
207207
@$(GODOTENV) -f .env go run ./cmd/gen-jwt -user-id $${USER_ID:-test-user}
208208

209+
# Build the generic builder image for builds
210+
build-builder:
211+
docker build -t hypeman/builder:latest -f lib/builds/images/generic/Dockerfile .
212+
213+
# Alias for backwards compatibility
214+
build-builders: build-builder
215+
216+
# Run E2E build system test (requires server running: make dev)
217+
e2e-build-test:
218+
@./scripts/e2e-build-test.sh
219+
209220
# Clean generated files and binaries
210221
clean:
211222
rm -rf $(BIN_DIR)

cmd/api/api/api.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package api
22

33
import (
44
"github.com/onkernel/hypeman/cmd/api/config"
5+
"github.com/onkernel/hypeman/lib/builds"
56
"github.com/onkernel/hypeman/lib/devices"
67
"github.com/onkernel/hypeman/lib/images"
78
"github.com/onkernel/hypeman/lib/ingress"
@@ -21,6 +22,7 @@ type ApiService struct {
2122
NetworkManager network.Manager
2223
DeviceManager devices.Manager
2324
IngressManager ingress.Manager
25+
BuildManager builds.Manager
2426
ResourceManager *resources.Manager
2527
}
2628

@@ -35,6 +37,7 @@ func New(
3537
networkManager network.Manager,
3638
deviceManager devices.Manager,
3739
ingressManager ingress.Manager,
40+
buildManager builds.Manager,
3841
resourceManager *resources.Manager,
3942
) *ApiService {
4043
return &ApiService{
@@ -45,6 +48,7 @@ func New(
4548
NetworkManager: networkManager,
4649
DeviceManager: deviceManager,
4750
IngressManager: ingressManager,
51+
BuildManager: buildManager,
4852
ResourceManager: resourceManager,
4953
}
5054
}

cmd/api/api/builds.go

Lines changed: 313 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,313 @@
1+
package api
2+
3+
import (
4+
"context"
5+
"encoding/json"
6+
"errors"
7+
"fmt"
8+
"io"
9+
"net/http"
10+
"strconv"
11+
12+
"github.com/onkernel/hypeman/lib/builds"
13+
"github.com/onkernel/hypeman/lib/logger"
14+
"github.com/onkernel/hypeman/lib/oapi"
15+
)
16+
17+
// ListBuilds returns all builds
18+
func (s *ApiService) ListBuilds(ctx context.Context, request oapi.ListBuildsRequestObject) (oapi.ListBuildsResponseObject, error) {
19+
log := logger.FromContext(ctx)
20+
21+
domainBuilds, err := s.BuildManager.ListBuilds(ctx)
22+
if err != nil {
23+
log.ErrorContext(ctx, "failed to list builds", "error", err)
24+
return oapi.ListBuilds500JSONResponse{
25+
Code: "internal_error",
26+
Message: "failed to list builds",
27+
}, nil
28+
}
29+
30+
oapiBuilds := make([]oapi.Build, len(domainBuilds))
31+
for i, b := range domainBuilds {
32+
oapiBuilds[i] = buildToOAPI(b)
33+
}
34+
35+
return oapi.ListBuilds200JSONResponse(oapiBuilds), nil
36+
}
37+
38+
// CreateBuild creates a new build job
39+
func (s *ApiService) CreateBuild(ctx context.Context, request oapi.CreateBuildRequestObject) (oapi.CreateBuildResponseObject, error) {
40+
log := logger.FromContext(ctx)
41+
42+
// Parse multipart form fields
43+
var sourceData []byte
44+
var baseImageDigest, cacheScope, dockerfile string
45+
var timeoutSeconds int
46+
var secrets []builds.SecretRef
47+
48+
for {
49+
part, err := request.Body.NextPart()
50+
if err == io.EOF {
51+
break
52+
}
53+
if err != nil {
54+
return oapi.CreateBuild400JSONResponse{
55+
Code: "invalid_request",
56+
Message: "failed to parse multipart form",
57+
}, nil
58+
}
59+
60+
switch part.FormName() {
61+
case "source":
62+
sourceData, err = io.ReadAll(part)
63+
if err != nil {
64+
return oapi.CreateBuild400JSONResponse{
65+
Code: "invalid_source",
66+
Message: "failed to read source data",
67+
}, nil
68+
}
69+
case "base_image_digest":
70+
data, err := io.ReadAll(part)
71+
if err != nil {
72+
return oapi.CreateBuild400JSONResponse{
73+
Code: "invalid_request",
74+
Message: "failed to read base_image_digest field",
75+
}, nil
76+
}
77+
baseImageDigest = string(data)
78+
case "cache_scope":
79+
data, err := io.ReadAll(part)
80+
if err != nil {
81+
return oapi.CreateBuild400JSONResponse{
82+
Code: "invalid_request",
83+
Message: "failed to read cache_scope field",
84+
}, nil
85+
}
86+
cacheScope = string(data)
87+
case "dockerfile":
88+
data, err := io.ReadAll(part)
89+
if err != nil {
90+
return oapi.CreateBuild400JSONResponse{
91+
Code: "invalid_request",
92+
Message: "failed to read dockerfile field",
93+
}, nil
94+
}
95+
dockerfile = string(data)
96+
case "timeout_seconds":
97+
data, err := io.ReadAll(part)
98+
if err != nil {
99+
return oapi.CreateBuild400JSONResponse{
100+
Code: "invalid_request",
101+
Message: "failed to read timeout_seconds field",
102+
}, nil
103+
}
104+
if v, err := strconv.Atoi(string(data)); err == nil {
105+
timeoutSeconds = v
106+
}
107+
case "secrets":
108+
data, err := io.ReadAll(part)
109+
if err != nil {
110+
return oapi.CreateBuild400JSONResponse{
111+
Code: "invalid_request",
112+
Message: "failed to read secrets field",
113+
}, nil
114+
}
115+
if err := json.Unmarshal(data, &secrets); err != nil {
116+
return oapi.CreateBuild400JSONResponse{
117+
Code: "invalid_request",
118+
Message: "secrets must be a JSON array of {\"id\": \"...\", \"env_var\": \"...\"} objects",
119+
}, nil
120+
}
121+
}
122+
part.Close()
123+
}
124+
125+
if len(sourceData) == 0 {
126+
return oapi.CreateBuild400JSONResponse{
127+
Code: "invalid_request",
128+
Message: "source is required",
129+
}, nil
130+
}
131+
132+
// Note: Dockerfile validation happens in the builder agent.
133+
// It will check if Dockerfile is in the source tarball or provided via dockerfile parameter.
134+
135+
// Build domain request
136+
domainReq := builds.CreateBuildRequest{
137+
BaseImageDigest: baseImageDigest,
138+
CacheScope: cacheScope,
139+
Dockerfile: dockerfile,
140+
Secrets: secrets,
141+
}
142+
143+
// Apply timeout if provided
144+
if timeoutSeconds > 0 {
145+
domainReq.BuildPolicy = &builds.BuildPolicy{
146+
TimeoutSeconds: timeoutSeconds,
147+
}
148+
}
149+
150+
build, err := s.BuildManager.CreateBuild(ctx, domainReq, sourceData)
151+
if err != nil {
152+
switch {
153+
case errors.Is(err, builds.ErrDockerfileRequired):
154+
return oapi.CreateBuild400JSONResponse{
155+
Code: "dockerfile_required",
156+
Message: err.Error(),
157+
}, nil
158+
case errors.Is(err, builds.ErrInvalidSource):
159+
return oapi.CreateBuild400JSONResponse{
160+
Code: "invalid_source",
161+
Message: err.Error(),
162+
}, nil
163+
default:
164+
log.ErrorContext(ctx, "failed to create build", "error", err)
165+
return oapi.CreateBuild500JSONResponse{
166+
Code: "internal_error",
167+
Message: "failed to create build",
168+
}, nil
169+
}
170+
}
171+
172+
return oapi.CreateBuild202JSONResponse(buildToOAPI(build)), nil
173+
}
174+
175+
// GetBuild gets build details
176+
func (s *ApiService) GetBuild(ctx context.Context, request oapi.GetBuildRequestObject) (oapi.GetBuildResponseObject, error) {
177+
log := logger.FromContext(ctx)
178+
179+
build, err := s.BuildManager.GetBuild(ctx, request.Id)
180+
if err != nil {
181+
if errors.Is(err, builds.ErrNotFound) {
182+
return oapi.GetBuild404JSONResponse{
183+
Code: "not_found",
184+
Message: "build not found",
185+
}, nil
186+
}
187+
log.ErrorContext(ctx, "failed to get build", "error", err, "id", request.Id)
188+
return oapi.GetBuild500JSONResponse{
189+
Code: "internal_error",
190+
Message: "failed to get build",
191+
}, nil
192+
}
193+
194+
return oapi.GetBuild200JSONResponse(buildToOAPI(build)), nil
195+
}
196+
197+
// CancelBuild cancels a build
198+
func (s *ApiService) CancelBuild(ctx context.Context, request oapi.CancelBuildRequestObject) (oapi.CancelBuildResponseObject, error) {
199+
log := logger.FromContext(ctx)
200+
201+
err := s.BuildManager.CancelBuild(ctx, request.Id)
202+
if err != nil {
203+
switch {
204+
case errors.Is(err, builds.ErrNotFound):
205+
return oapi.CancelBuild404JSONResponse{
206+
Code: "not_found",
207+
Message: "build not found",
208+
}, nil
209+
case errors.Is(err, builds.ErrBuildInProgress):
210+
return oapi.CancelBuild409JSONResponse{
211+
Code: "conflict",
212+
Message: "build already in progress",
213+
}, nil
214+
default:
215+
log.ErrorContext(ctx, "failed to cancel build", "error", err, "id", request.Id)
216+
return oapi.CancelBuild500JSONResponse{
217+
Code: "internal_error",
218+
Message: "failed to cancel build",
219+
}, nil
220+
}
221+
}
222+
223+
return oapi.CancelBuild204Response{}, nil
224+
}
225+
226+
// GetBuildEvents streams build events via SSE
227+
// With follow=false (default), streams existing logs then closes
228+
// With follow=true, continues streaming until build completes
229+
func (s *ApiService) GetBuildEvents(ctx context.Context, request oapi.GetBuildEventsRequestObject) (oapi.GetBuildEventsResponseObject, error) {
230+
log := logger.FromContext(ctx)
231+
232+
// Parse follow parameter (default false)
233+
follow := false
234+
if request.Params.Follow != nil {
235+
follow = *request.Params.Follow
236+
}
237+
238+
eventChan, err := s.BuildManager.StreamBuildEvents(ctx, request.Id, follow)
239+
if err != nil {
240+
if errors.Is(err, builds.ErrNotFound) {
241+
return oapi.GetBuildEvents404JSONResponse{
242+
Code: "not_found",
243+
Message: "build not found",
244+
}, nil
245+
}
246+
log.ErrorContext(ctx, "failed to stream build events", "error", err, "id", request.Id)
247+
return oapi.GetBuildEvents500JSONResponse{
248+
Code: "internal_error",
249+
Message: "failed to stream build events",
250+
}, nil
251+
}
252+
253+
return buildEventsStreamResponse{eventChan: eventChan}, nil
254+
}
255+
256+
// buildEventsStreamResponse implements oapi.GetBuildEventsResponseObject with proper SSE streaming
257+
type buildEventsStreamResponse struct {
258+
eventChan <-chan builds.BuildEvent
259+
}
260+
261+
func (r buildEventsStreamResponse) VisitGetBuildEventsResponse(w http.ResponseWriter) error {
262+
w.Header().Set("Content-Type", "text/event-stream")
263+
w.Header().Set("Cache-Control", "no-cache")
264+
w.Header().Set("Connection", "keep-alive")
265+
w.Header().Set("X-Accel-Buffering", "no") // Disable nginx buffering
266+
w.WriteHeader(200)
267+
268+
flusher, ok := w.(http.Flusher)
269+
if !ok {
270+
return fmt.Errorf("streaming not supported")
271+
}
272+
273+
for event := range r.eventChan {
274+
jsonEvent, err := json.Marshal(event)
275+
if err != nil {
276+
continue
277+
}
278+
fmt.Fprintf(w, "data: %s\n\n", jsonEvent)
279+
flusher.Flush()
280+
}
281+
return nil
282+
}
283+
284+
// buildToOAPI converts a domain Build to OAPI Build
285+
func buildToOAPI(b *builds.Build) oapi.Build {
286+
oapiBuild := oapi.Build{
287+
Id: b.ID,
288+
Status: oapi.BuildStatus(b.Status),
289+
QueuePosition: b.QueuePosition,
290+
ImageDigest: b.ImageDigest,
291+
ImageRef: b.ImageRef,
292+
Error: b.Error,
293+
CreatedAt: b.CreatedAt,
294+
StartedAt: b.StartedAt,
295+
CompletedAt: b.CompletedAt,
296+
DurationMs: b.DurationMS,
297+
}
298+
299+
if b.Provenance != nil {
300+
oapiBuild.Provenance = &oapi.BuildProvenance{
301+
BaseImageDigest: &b.Provenance.BaseImageDigest,
302+
SourceHash: &b.Provenance.SourceHash,
303+
BuildkitVersion: &b.Provenance.BuildkitVersion,
304+
Timestamp: &b.Provenance.Timestamp,
305+
}
306+
if len(b.Provenance.LockfileHashes) > 0 {
307+
oapiBuild.Provenance.LockfileHashes = &b.Provenance.LockfileHashes
308+
}
309+
}
310+
311+
return oapiBuild
312+
}
313+

cmd/api/api/images_test.go

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -225,9 +225,17 @@ func TestCreateImage_Idempotent(t *testing.T) {
225225
t.Fatal("Build failed - this is the root cause of test failures")
226226
}
227227

228-
require.Equal(t, oapi.ImageStatus(images.StatusPending), img2.Status)
229-
require.NotNil(t, img2.QueuePosition, "should have queue position")
230-
require.Equal(t, 1, *img2.QueuePosition, "should still be at position 1")
228+
// Status can be "pending" (still processing) or "ready" (already completed in fast CI)
229+
// The key idempotency invariant is that the digest is the same (verified above)
230+
require.Contains(t, []oapi.ImageStatus{
231+
oapi.ImageStatus(images.StatusPending),
232+
oapi.ImageStatus(images.StatusReady),
233+
}, img2.Status, "status should be pending or ready")
234+
235+
// If still pending, should have queue position
236+
if img2.Status == oapi.ImageStatus(images.StatusPending) {
237+
require.NotNil(t, img2.QueuePosition, "should have queue position when pending")
238+
}
231239

232240
// Construct digest reference: repository@digest
233241
// Extract repository from imageName (strip tag part)

0 commit comments

Comments
 (0)