Skip to content

Commit 8ff9678

Browse files
committed
docs: add build system roadmap and security model
Includes planned phases for cache optimization, security hardening, additional runtimes, and observability. Documents threat model and open design questions.
1 parent 55d96ab commit 8ff9678

File tree

1 file changed

+181
-0
lines changed

1 file changed

+181
-0
lines changed

lib/builds/PLAN.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# Build System Roadmap
2+
3+
## Current State (v0.1)
4+
5+
- ✅ Source-to-image builds in isolated microVMs
6+
- ✅ BuildKit-based builds with daemonless execution
7+
- ✅ Tenant-isolated registry caching
8+
- ✅ Node.js 20 and Python 3.12 runtimes
9+
- ✅ Vsock communication for build results
10+
- ✅ Cgroup mounting for container runtime support
11+
12+
## Planned Improvements
13+
14+
### Phase 1: Cache Optimization
15+
16+
**Goal**: Reduce build times by sharing common base layers across tenants.
17+
18+
#### Multi-tier Cache Strategy
19+
20+
```
21+
Import order (first match wins):
22+
1. shared/{runtime}/base ← Pre-warmed with OS + runtime layers (read-only)
23+
2. {tenant}/{runtime}/{hash} ← Tenant-specific dependency layers
24+
25+
Export to:
26+
→ {tenant}/{runtime}/{hash} ← Only tenant-specific layers
27+
```
28+
29+
#### Benefits
30+
- **Fast builds**: Common layers (apt packages, Node.js binary, etc.) are shared
31+
- **Tenant isolation**: Application dependencies remain isolated
32+
- **No cross-tenant poisoning**: Tenants can only write to their own scope
33+
- **Controlled shared cache**: Only operators can update the shared base cache
34+
35+
#### Implementation Tasks
36+
- [ ] Update `cache.go` with `ImportCacheArgs() []string` returning multiple args
37+
- [ ] Update `builder_agent/main.go` to handle multiple `--import-cache` flags
38+
- [ ] Add CLI/API endpoint for pre-warming shared cache
39+
- [ ] Create cron job or webhook to refresh shared cache on base image updates
40+
- [ ] Document cache warming process in README
41+
42+
### Phase 2: Security Hardening
43+
44+
#### Secret Management
45+
- [ ] Implement vsock-based secret injection (secrets never written to disk)
46+
- [ ] Add secret scoping per build (which secrets a build can access)
47+
- [ ] Audit logging for secret access during builds
48+
- [ ] Integration with external secret managers (Vault, AWS Secrets Manager)
49+
50+
#### Network Policy
51+
- [ ] Implement domain allowlist for `egress` mode
52+
- [ ] Add `isolated` mode (no network access during build phase)
53+
- [ ] Rate limiting on registry pushes to prevent abuse
54+
- [ ] DNS filtering for allowed domains
55+
56+
#### Build Provenance & Supply Chain Security
57+
- [ ] Sign build provenance with Sigstore/cosign
58+
- [ ] SLSA Level 2 compliance (authenticated build process)
59+
- [ ] SBOM (Software Bill of Materials) generation during builds
60+
- [ ] Vulnerability scanning of built images before push
61+
62+
### Phase 3: Additional Runtimes
63+
64+
| Runtime | Package Managers | Priority |
65+
|---------|-----------------|----------|
66+
| Go 1.22+ | go mod | High |
67+
| Ruby 3.3+ | bundler, gem | Medium |
68+
| Rust | cargo | Medium |
69+
| Java 21+ | Maven, Gradle | Medium |
70+
| PHP 8.3+ | composer | Low |
71+
| Custom Dockerfile | N/A | High |
72+
73+
#### Custom Dockerfile Support
74+
- [ ] Allow users to provide their own Dockerfile
75+
- [ ] Security review: sandbox custom Dockerfiles more strictly
76+
- [ ] Validate Dockerfile doesn't use dangerous instructions
77+
- [ ] Consider read-only base image allowlist
78+
79+
### Phase 4: Performance & Observability
80+
81+
#### Metrics (Prometheus)
82+
- [ ] `hypeman_build_duration_seconds` - histogram by runtime, status
83+
- [ ] `hypeman_build_cache_hits_total` - counter for cache hits/misses
84+
- [ ] `hypeman_build_queue_wait_seconds` - time spent in queue
85+
- [ ] `hypeman_build_vm_boot_seconds` - microVM boot time
86+
- [ ] `hypeman_build_push_duration_seconds` - registry push time
87+
88+
#### Logging Improvements
89+
- [ ] Structured JSON logs from builder agent
90+
- [ ] Log streaming during build (not just after completion)
91+
- [ ] Build log retention policy
92+
93+
#### Distributed Builds
94+
- [ ] Build worker pool across multiple hosts
95+
- [ ] Load balancing for build queue (consistent hashing by tenant?)
96+
- [ ] Horizontal scaling of build capacity
97+
- [ ] Worker health checks and automatic failover
98+
99+
## Security Model
100+
101+
### Threat Model
102+
103+
| Threat | Mitigation | Status |
104+
|--------|------------|--------|
105+
| Container escape to host | MicroVM isolation (separate kernel) | ✅ Implemented |
106+
| Cross-tenant cache poisoning | Tenant-scoped cache paths | ✅ Implemented |
107+
| Host kernel exploit | Separate kernel per VM | ✅ Implemented |
108+
| Malicious dependency exfiltration | Network isolation (egress control) | 🔄 Partial |
109+
| Secret theft during build | Vsock-only secret injection | 📋 Planned |
110+
| Registry credential theft | Per-build short-lived tokens | 📋 Planned |
111+
| Resource exhaustion (DoS) | VM resource limits | ✅ Implemented |
112+
| Build log information leak | Tenant-scoped log access | ✅ Implemented |
113+
114+
### Security Boundaries
115+
116+
```
117+
┌─────────────────────────────────────────────────────────────┐
118+
│ Host System │
119+
│ ┌─────────────────────────────────────────────────────────┐│
120+
│ │ Hypeman API ││
121+
│ │ - JWT authentication ││
122+
│ │ - Tenant isolation at API level ││
123+
│ └─────────────────────────────────────────────────────────┘│
124+
│ │ │
125+
│ ┌───────────────────────────┼───────────────────────────┐ │
126+
│ │ MicroVM Boundary (Cloud Hypervisor) │ │
127+
│ │ ┌─────────────────────────────────────────────────┐ │ │
128+
│ │ │ Builder VM │ │ │
129+
│ │ │ - Separate kernel │ │ │
130+
│ │ │ - Ephemeral (destroyed after build) │ │ │
131+
│ │ │ - Limited network (egress only to registry) │ │ │
132+
│ │ │ - No access to other tenants' data │ │ │
133+
│ │ │ ┌─────────────────────────────────────────┐ │ │ │
134+
│ │ │ │ BuildKit (rootless) │ │ │ │
135+
│ │ │ │ - User namespace isolation │ │ │ │
136+
│ │ │ │ - No real root privileges │ │ │ │
137+
│ │ │ └─────────────────────────────────────────┘ │ │ │
138+
│ │ └─────────────────────────────────────────────────┘ │ │
139+
│ └────────────────────────────────────────────────────────┘ │
140+
└─────────────────────────────────────────────────────────────┘
141+
```
142+
143+
### Not Protected (By Design)
144+
145+
These are inherent to the build process and cannot be fully mitigated:
146+
147+
1. **Malicious code execution during package install** - `npm install` and `pip install` execute arbitrary code by design
148+
2. **Supply chain attacks on upstream packages** - Typosquatting, compromised maintainers, etc.
149+
3. **Tenant poisoning their own cache** - A tenant can push malicious layers to their own cache scope
150+
4. **Information leakage via build output** - Malicious deps can encode secrets in build artifacts
151+
152+
## Open Questions
153+
154+
1. **Custom Dockerfiles**: Should we support user-provided Dockerfiles?
155+
- Pro: Flexibility for advanced users
156+
- Con: Larger attack surface, harder to secure
157+
- Possible middle ground: Allowlist of base images
158+
159+
2. **Cache TTL Policy**: How long should tenant caches be retained?
160+
- Options: 7 days, 30 days, size-based eviction, never (until explicit delete)
161+
- Consider: Storage costs vs build speed
162+
163+
3. **Build Artifact Signing**: Required for all builds or opt-in?
164+
- Required: Better security posture, SLSA compliance
165+
- Opt-in: Less friction for getting started
166+
167+
4. **Multi-arch Builds**: Worth the complexity?
168+
- Use case: Deploy same image to ARM and x86
169+
- Complexity: Requires QEMU or cross-compilation support
170+
171+
5. **Build Concurrency Limits**: Per-tenant or global?
172+
- Per-tenant: Fair sharing, prevents noisy neighbor
173+
- Global: Simpler, but one tenant could starve others
174+
175+
## References
176+
177+
- [BuildKit GitHub](https://github.com/moby/buildkit)
178+
- [Rootless Containers](https://rootlesscontaine.rs/)
179+
- [SLSA Framework](https://slsa.dev/)
180+
- [Sigstore](https://www.sigstore.dev/)
181+
- [Cloud Hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor)

0 commit comments

Comments
 (0)