Skip to content

Unify BuilderNet mkosi recipe#360

Merged
canercidam merged 23 commits intofryd/mkosi-playgroundfrom
caner/buildernet-recipe
Feb 11, 2026
Merged

Unify BuilderNet mkosi recipe#360
canercidam merged 23 commits intofryd/mkosi-playgroundfrom
caner/buildernet-recipe

Conversation

@canercidam
Copy link
Member

@canercidam canercidam commented Feb 10, 2026

This PR tries to unify the BuilderNet recipe by basing itself on Fryd's PR (#302). It may contain changes from PRs awaiting to be merged and will shrink in size as they are merged.

@canercidam canercidam changed the base branch from fryd/mkosi-playground to main February 10, 2026 02:52
@canercidam canercidam changed the base branch from main to fryd/mkosi-playground February 10, 2026 02:52
@flashbots flashbots deleted a comment from claude bot Feb 10, 2026
@claude
Copy link
Contributor

claude bot commented Feb 10, 2026

Claude finished @canercidam's task in 3m 10s —— View job


PR #360 Review

  • Gather context and read changed files
  • Analyze code quality, bugs, and security
  • Post inline comments for specific issues
  • Post overall review summary

Summary

This PR restructures custom recipes to a group/variant/playground.yaml directory convention, internalizes buildernet recipe overrides, moves post-hook execution into LocalRunner.Run(), and adds FindService/RemoveService/ReplaceArgs methods on Component/Service.

Issues Found

Bug — --subscribe-all-subnets in BuilderNet recipe (inline comment)

recipe_buildernet.go:64-67 uses ReplaceArgs with "--subscribe-all-subnets": "true", but:

  1. --subscribe-all-subnets was removed from LighthouseBeaconNode.Apply() in components.go (this PR), so ReplaceArgs won't find it.
  2. --subscribe-all-subnets is a boolean flag in Lighthouse (no value argument), so using ReplaceArgs which replaces args[i+1] would corrupt the next argument even if the flag were present.

You likely need beacon.WithArgs("--subscribe-all-subnets") to add it, and only use ReplaceArgs for --target-peers.

Bug — Index out of bounds in applyReplaceArgs (inline comment)

recipe_yaml.go:467-479: When replaceArgs has an odd length, the warning fires but execution continues. At i = len(replaceArgs)-1, replaceArgs[i+1] panics with index out of range. The warning should either return early or truncate the slice.

Behavioral change — Post-hook timing (inline comment)

Post-hooks now execute in LocalRunner.Run() after docker compose up -d but before WaitForReady. Previously they ran after services were healthy. The registerBuilder hook calls the beacon API and reth API — these may not be ready yet at the new timing. This needs clarification on whether it's intentional.

Code duplication — removeService vs Component.RemoveService

The standalone removeService function in recipe_yaml.go:314-326 is now an exact duplicate of the new Component.RemoveService method in manifest.go:91-101. The findService function was correctly deduplicated (replaced by Component.FindService), but removeService was not. Line 262 in recipe_yaml.go should use component.RemoveService(serviceName) and the standalone function should be deleted.

API inconsistency — Two ReplaceArgs interfaces (inline comment)

Service.ReplaceArgs takes map[string]string, while the YAML-layer applyReplaceArgs takes []string of alternating pairs. Two different interfaces for the same concept.

Minor Notes

  • ovmf.sh depends on jq being installed but doesn't check for it — will produce confusing errors if missing.
  • start.sh:25kill -0 $(cat "${PIDFILE}") is unquoted; if the PID file is empty or corrupt, this will misbehave. Consider quoting.
  • Tests are well structured and cover the new custom recipe format comprehensively.

Comment on lines +64 to +67
beacon.ReplaceArgs(map[string]string{
"--target-peers": "1",
"--subscribe-all-subnets": "true",
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: --subscribe-all-subnets was removed from LighthouseBeaconNode.Apply() in components.go (this PR), so ReplaceArgs will not find it. Even if it were still present, --subscribe-all-subnets is a boolean flag in Lighthouse (it doesn't take a separate value argument), so the ReplaceArgs approach of replacing args[i+1] would corrupt the next unrelated argument.

You likely need to use WithArgs("--subscribe-all-subnets") to add the flag back, and only use ReplaceArgs for --target-peers which does take a value.

Comment on lines +467 to +479
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}

result := make([]string, len(args))
copy(result, args)

for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Index out of bounds panic when replaceArgs has an odd length. The warning on line 469 fires but execution continues. At i = len(replaceArgs)-1 (the last odd element), replaceArgs[i+1] will panic with an index out of range.

After the warning, you should return early or truncate the iteration bound:

Suggested change
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value, ignoring last unpaired element", "count", len(replaceArgs))
replaceArgs = replaceArgs[:len(replaceArgs)-1]
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
return result
}

Comment on lines +1177 to +1180
// run post hook operations
if err := d.manifest.ExecutePostHookActions(); err != nil {
return fmt.Errorf("failed to execute post-hook operations: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving post-hook execution here changes the timing: it now runs after docker compose up -d but before services are healthy (the WaitForReady call happens in main.go after Run() returns). Previously it was called after WaitForReady.

If any post-hook depends on services being fully ready (e.g., the registerBuilder hook that calls the beacon API and reth API), this will fail because those services may not be accepting connections yet.

Is this intentional? If so, consider adding a comment explaining why the timing change is safe.

Comment on lines +540 to +550
// ReplaceArgs replaces argument values in the service's Args.
// The replacements map contains flag -> new_value pairs.
// For each flag found in Args, the following value is replaced.
func (s *Service) ReplaceArgs(replacements map[string]string) *Service {
for i := 0; i < len(s.Args); i++ {
if newValue, ok := replacements[s.Args[i]]; ok && i+1 < len(s.Args) {
s.Args[i+1] = newValue
}
}
return s
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Service.ReplaceArgs (takes a map[string]string) has a different signature than the YAML-layer applyReplaceArgs (takes a []string of alternating pairs). Two different interfaces for the same concept could be confusing. Consider unifying them or clearly documenting why they differ.

Also, if a flag appears as the last element of Args (no value follows), the i+1 < len(s.Args) check silently skips it without any warning, which could mask misconfiguration.

canercidam and others added 3 commits February 10, 2026 16:08
…358)

This PR changes the definitions of the custom recipes we include in the
playground CLI binary. We're switching from
```
custom-recipes/
  rbuilder/
    bin.yaml
    custom.yaml
    rbuilder.toml
```
to
```
 custom-recipes/
   rbuilder/
     bin/
       playground.yaml
       rbuilder.toml
     custom/
       playground.yaml
       rbuilder.toml
```
 
This separates extra files (e.g. `rbuilder.toml`) for each recipe
variant and enables `rbuilder/container` and `buildernet/qemu` custom
recipes.
- Add `rbuilder/container` custom recipe (currently runs on Mac)
- Remove `rbuilder/custom` in favor of reusing `rbuilder/bin` with a
suggestion

---------

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
This allows definining custom services like:
```yaml
recipe:
  my-component:
    services:
      my-service:
        lifecycle_hooks: true
        init:
          - cmd1
          - cmd2
        start: cmd3
        stop:
          - cmd4
          - cmd5
```

This is useful in bringing in external VM handling for the buildernet
recipe.
@claude
Copy link
Contributor

claude bot commented Feb 11, 2026

Claude finished @canercidam's task in 3m 25s —— View job


PR #360 Review

  • Gather context and read changed files
  • Analyze code quality, bugs, and security
  • Post inline comments for specific issues
  • Post overall review summary

Summary

This PR adds lifecycle hooks for custom services (init/start/stop commands), restructures custom recipes to group/variant/playground.yaml directory convention, internalizes buildernet recipe overrides, moves post-hook execution into LocalRunner.Run(), and adds FindService/RemoveService/ReplaceArgs methods on Component/Service.

The lifecycle hooks feature is well-designed with good test coverage. The directory restructuring is clean. Several issues from the previous review remain unaddressed.

Issues

Bug — --subscribe-all-subnets in BuilderNet recipe (inline comment)

recipe_buildernet.go:64-67--subscribe-all-subnets was removed from LighthouseBeaconNode.Apply() in this PR, so ReplaceArgs won't find it. Additionally, it's a boolean flag (no value argument), so ReplaceArgs is the wrong mechanism. Use beacon.WithArgs("--subscribe-all-subnets") to add it back, and only use ReplaceArgs for --target-peers.

Bug — Index out of bounds in applyReplaceArgs (inline comment)

recipe_yaml.go:490-504 — When replaceArgs has an odd length, the warning fires but execution continues. At the last odd index, replaceArgs[i+1] panics. Need to truncate or return early after the warning.

Race condition — lifecycle start command (inline comment)

local_runner_lifecycle.go:105-118startCmd.Run() is launched in a goroutine before the handle is appended to d.handles. The process could exit (or fail to start) before being tracked. Use cmd.Start() explicitly, append the handle, then cmd.Wait() in the goroutine.

Behavioral change — Post-hook timing (inline comment)

local_runner.go:1199-1201 — Post-hooks now execute after docker compose up -d but before WaitForReady. The registerBuilder hook calls beacon and reth APIs which may not be ready yet. If this is intentional, add a comment explaining why.

Code duplication — removeService vs Component.RemoveService

The standalone removeService function in recipe_yaml.go:331-342 is an exact duplicate of the new Component.RemoveService method in manifest.go:100-110. The findService function was correctly deduplicated (replaced by Component.FindService), but removeService was not. Line 278 in recipe_yaml.go should use component.RemoveService(serviceName) and the standalone function should be deleted. Fix this →

API inconsistency — Two ReplaceArgs interfaces

Service.ReplaceArgs (in manifest.go:552) takes map[string]string, while the YAML-layer applyReplaceArgs (in recipe_yaml.go:490) takes []string of alternating pairs. Two different interfaces for the same concept.

Minor Notes

  • ovmf.sh depends on jq but doesn't check for it — will produce confusing errors if missing.
  • start.sh:25kill -0 $(cat "${PIDFILE}") is unquoted; if the PID file is empty or corrupt, this will misbehave.
  • Tests are well structured and comprehensively cover the new lifecycle hooks and custom recipe format.
  • The resp.Body.Close() and error check additions in recipe_buildernet.go:163,178 are good fixes.

Comment on lines +493 to +500
}

result := make([]string, len(args))
copy(result, args)

for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (still present from previous review): Index out of bounds panic when replaceArgs has an odd length. The warning on line 491 fires but execution continues. At i = len(replaceArgs)-1, replaceArgs[i+1] panics.

Suggested change
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
replaceArgs = replaceArgs[:len(replaceArgs)-1]
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]

Comment on lines +64 to +67
beacon.ReplaceArgs(map[string]string{
"--target-peers": "1",
"--subscribe-all-subnets": "true",
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (still present from previous review): --subscribe-all-subnets was removed from LighthouseBeaconNode.Apply() in this PR, so ReplaceArgs won't find it. Even if it were present, it's a boolean flag (no value argument), so ReplaceArgs replacing args[i+1] is semantically wrong.

This should be:

beacon.ReplaceArgs(map[string]string{
    "--target-peers": "1",
})
beacon.WithArgs("--subscribe-all-subnets")

Comment on lines +105 to +118
startCmd := lc.newCmd(ctx, svc.Start)
go func() {
if err := startCmd.Run(); err != nil {
if mainctx.IsExiting() {
return
}
slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
}
}()

d.handlesMu.Lock()
defer d.handlesMu.Unlock()
d.handles = append(d.handles, startCmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: The goroutine on line 106 calls startCmd.Run() (which internally calls Start() then Wait()). The command is appended to d.handles here, after the goroutine is launched. If the process exits very quickly (or fails to start), sendExitError could fire before the handle is tracked.

Consider starting the command explicitly before the goroutine so you have a guaranteed running process:

if err := startCmd.Start(); err != nil {
    return fmt.Errorf("%s", lc.formatError("start", svc.Start, err))
}

d.handlesMu.Lock()
d.handles = append(d.handles, startCmd)
d.handlesMu.Unlock()

go func() {
    if err := startCmd.Wait(); err != nil {
        ...
    }
}()

This also avoids the window where stopAllProcessesWithSignal could find a handle whose Process is still nil.

Comment on lines +1199 to +1201
// run post hook operations
if err := d.manifest.ExecutePostHookActions(); err != nil {
return fmt.Errorf("failed to execute post-hook operations: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavioral change (still present from previous review): Post-hooks now execute after docker compose up -d but before WaitForReady (which is called in main.go after Run() returns). The registerBuilder hook calls the beacon API and reth API — these may not be accepting connections yet.

If lifecycle host services (like the VM) need the post-hook to have run before they start, this ordering makes sense. But it's worth adding a comment explaining the intentional timing, since the previous code ran post-hooks after services were healthy.

@claude
Copy link
Contributor

claude bot commented Feb 11, 2026

Claude finished @canercidam's task in 3m 13s —— View job


PR #360 Review

  • Gather context and read changed files
  • Analyze code quality, bugs, and security
  • Post inline comments for specific issues
  • Post overall review summary

Summary

This PR restructures custom recipes to a group/variant/playground.yaml directory convention, internalizes buildernet recipe overrides, adds lifecycle hooks for custom services (init/start/stop), moves WaitForReady and post-hook execution into LocalRunner.Run(), adds FindService/RemoveService/ReplaceArgs/DependsOnNone methods on Component/Service, and adds a validate command.

The latest commit (2d00abc) fixes several issues from previous reviews:

  • Fixed: --subscribe-all-subnets now correctly uses beacon.WithArgs("--subscribe-all-subnets") instead of ReplaceArgs
  • Fixed: Post-hook timing — WaitForReady is now called before ExecutePostHookActions inside Run(), so services are healthy when the registerBuilder hook runs
  • Fixed: postHook.Action now receives a context.Context for proper cancellation propagation

Remaining Issues

Bug — Index out of bounds in applyReplaceArgs (inline comment)

recipe_yaml.go:490-502: When replaceArgs has an odd length, the warning fires but execution continues. At the last odd index, replaceArgs[i+1] panics with index out of range. Truncate the slice after the warning.

Race condition — lifecycle start command (inline comment)

local_runner_lifecycle.go:105-118: startCmd.Run() is launched in a goroutine before the handle is appended to d.handles. The process could exit (or fail to start) before being tracked. Use cmd.Start() explicitly, append the handle, then cmd.Wait() in the goroutine.

Code duplication — removeService vs Component.RemoveService

The standalone removeService function in recipe_yaml.go:331-342 is an exact duplicate of Component.RemoveService in manifest.go:92-102. Line 278 in recipe_yaml.go should use component.RemoveService(serviceName) and the standalone function should be deleted. Fix this →

Bug — RunnerConfig.AddCallback silently drops callbacks

local_runner.go:114-118: AddCallback only appends when r.Callbacks is nil. Once initialized (e.g., by NewLocalRunner setting a noop callback, or by the first call), subsequent calls are silently dropped. This means the watchdog callback registered in main.go:811 and the debug logging callback in main.go:746 are never actually added. Should be:

func (r *RunnerConfig) AddCallback(c Callback) {
    r.Callbacks = append(r.Callbacks, c)
}

API inconsistency — Two ReplaceArgs interfaces

Service.ReplaceArgs (in manifest.go:553) takes map[string]string, while the YAML-layer applyReplaceArgs (in recipe_yaml.go:490) takes []string of alternating pairs. Two different interfaces for the same concept.

Minor Notes

  • ovmf.sh depends on jq being installed but doesn't check for it — will produce confusing errors if missing.
  • start.sh:25kill -0 $(cat "${PIDFILE}") is unquoted; if the PID file is empty or corrupt, this will misbehave.
  • The Fileserver component was moved from L1 recipe to BuilderNet recipe — verify this is intentional since L1 recipe users will no longer get the file server.
  • Tests are well structured and comprehensively cover lifecycle hooks, validation, and custom recipe format.

Comment on lines +490 to +502
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}

result := make([]string, len(args))
copy(result, args)

for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (persists from previous reviews): Index out of bounds panic when replaceArgs has an odd length. The warning on line 492 fires but execution continues. At the last odd index, replaceArgs[i+1] on line 500 panics.

Suggested change
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value, ignoring last unpaired element", "count", len(replaceArgs))
replaceArgs = replaceArgs[:len(replaceArgs)-1]
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
return result
}

Comment on lines +105 to +119
startCmd := lc.newCmd(ctx, svc.Start)
go func() {
if err := startCmd.Run(); err != nil {
if mainctx.IsExiting() {
return
}
slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
}
}()

d.handlesMu.Lock()
defer d.handlesMu.Unlock()
d.handles = append(d.handles, startCmd)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition (persists from previous reviews): startCmd.Run() (which internally calls Start() then Wait()) is launched in a goroutine. The handle is appended to d.handles only afterward on line 118. If the process exits very quickly, sendExitError could fire before the handle is tracked, and stopAllProcessesWithSignal could find a handle whose Process is still nil.

Consider using Start() explicitly, appending the handle, then Wait() in the goroutine:

if err := startCmd.Start(); err != nil {
    return fmt.Errorf("%s", lc.formatError("start", svc.Start, err))
}

d.handlesMu.Lock()
d.handles = append(d.handles, startCmd)
d.handlesMu.Unlock()

go func() {
    if err := startCmd.Wait(); err != nil {
        if mainctx.IsExiting() {
            return
        }
        slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
        d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
    }
}()
return nil

@claude
Copy link
Contributor

claude bot commented Feb 11, 2026

Claude finished @canercidam's task in 3m 41s —— View job


PR #360 Review

  • Gather context and read changed files
  • Analyze code quality, bugs, and security
  • Post inline comments for specific issues
  • Post overall review summary

Summary

This PR restructures custom recipes to a group/variant/playground.yaml directory convention, internalizes buildernet recipe overrides, adds lifecycle hooks for custom services (init/start/stop), moves WaitForReady and post-hook execution into main.go (correctly sequenced), adds FindService/RemoveService/ReplaceArgs/DependsOnNone methods on Component/Service, and adds a validate command.

Compared to previous reviews, several issues have been fixed:

  • Fixed: --subscribe-all-subnets now correctly uses beacon.WithArgs("--subscribe-all-subnets") instead of ReplaceArgs
  • Fixed: Post-hook timing — WaitForReady is now called before ExecutePostHookActions in main.go, so services are healthy when the registerBuilder hook runs
  • Fixed: postHook.Action now receives a context.Context for proper cancellation propagation

Remaining Issues

Bug — Index out of bounds in applyReplaceArgs (inline comment)

recipe_yaml.go:490-504: When replaceArgs has an odd length, the warning fires but execution continues. At the last odd index, replaceArgs[i+1] panics. Truncate the slice after the warning. Fix this →

Race condition — lifecycle start command (inline comment)

local_runner_lifecycle.go:105-118: startCmd.Run() is launched in a goroutine before the handle is appended to d.handles. The process could exit (or fail to start) before being tracked. Use cmd.Start() explicitly, append the handle, then cmd.Wait() in the goroutine. The same pattern exists in local_runner.go:953-974 for the regular runOnHost path. Fix this →

Code duplication — removeService vs Component.RemoveService

The standalone removeService function in recipe_yaml.go:331-342 is an exact duplicate of Component.RemoveService in manifest.go:92-102. The findService function was correctly deduplicated (replaced by Component.FindService), but removeService was not. Line 278 should use component.RemoveService(serviceName) and the standalone function should be deleted. Fix this →

Bug — RunnerConfig.AddCallback silently drops callbacks (pre-existing, not in this PR's diff)

local_runner.go:114-118: AddCallback only appends when r.Callbacks is nil. Once initialized (e.g., by NewLocalRunner setting a noop on line 174, or by the first AddCallback call), subsequent calls are silently dropped. This means the watchdog callback registered in main.go:824 and the debug logging callback in main.go:747 are never actually added. While this is pre-existing, it affects the new lifecycle hooks flow. The fix is to remove the if r.Callbacks == nil guard.

API inconsistency — Two ReplaceArgs interfaces

Service.ReplaceArgs (manifest.go:553) takes map[string]string, while the YAML-layer applyReplaceArgs (recipe_yaml.go:490) takes []string of alternating pairs. Two different interfaces for the same concept.

Duplicate validation block in Manifest.Validate (pre-existing)

manifest.go:323-349: The "validate that the mounts are correct" block is copy-pasted — lines 337-349 are an exact duplicate of lines 323-335.

Minor Notes

  • ovmf.sh depends on jq being installed but doesn't check for it — will produce confusing errors if missing.
  • start.sh:25kill -0 $(cat "${PIDFILE}") is unquoted; if the PID file is empty or corrupt, this will misbehave.
  • Tests are well structured and comprehensively cover lifecycle hooks, validation, and custom recipe format.

This avoids a chicken-egg situation when there are cross-dependencies.
This allows replacing flags like:
```yaml
  - "--foo"
  - "bar"
```
with:
```yaml
replace_args:
  - "--foo"
  - "baz"
```
without redefining the entire flag set. It is not used in case the
custom recipe prefers definining the full set of args with `args`.

(This was considered for supporting the buildernet recipe but is no
longer necessary for that. It is still a potentially useful feature.)
@claude
Copy link
Contributor

claude bot commented Feb 11, 2026

Claude finished @canercidam's task in 2m 38s —— View job


PR #360 Review

  • Gather context and read changed files
  • Analyze code quality, bugs, and security
  • Post inline comments for specific issues
  • Post overall review summary

Summary

This PR restructures custom recipes to a group/variant/playground.yaml directory convention, internalizes buildernet recipe overrides, adds lifecycle hooks for custom services (init/start/stop), moves WaitForReady and post-hook execution into main.go (correctly sequenced), adds FindService/RemoveService/ReplaceArgs/DependsOnNone methods on Component/Service, and adds a validate command.

Several issues from previous reviews have been fixed:

  • Fixed: --subscribe-all-subnets now correctly uses beacon.WithArgs("--subscribe-all-subnets") instead of ReplaceArgs
  • Fixed: Post-hook timing — WaitForReady is called before ExecutePostHookActions in main.go, so services are healthy when the registerBuilder hook runs
  • Fixed: Lifecycle hooks run after WaitForReady and post-hooks, which is correct ordering

Remaining Issues

Bug — Index out of bounds in applyReplaceArgs (inline comment)

recipe_yaml.go:491-501: When replaceArgs has an odd length, the warning fires but execution continues. At the last odd index, replaceArgs[i+1] panics. Truncate the slice after the warning. Fix this →

Race condition — lifecycle start command (inline comment)

local_runner_lifecycle.go:111-125: startCmd.Run() is launched in a goroutine before the handle is appended to d.handles. The process could exit before being tracked. The same pattern exists in local_runner.go:959-980 for the regular runOnHost path. Use cmd.Start() explicitly, append the handle, then cmd.Wait() in the goroutine. Fix this →

Code duplication — removeService vs Component.RemoveService

The standalone removeService function in recipe_yaml.go:331-342 is an exact duplicate of Component.RemoveService in manifest.go:92-102. Line 278 should use component.RemoveService(serviceName) and the standalone function should be deleted. Fix this →

Bug — RunnerConfig.AddCallback silently drops callbacks

local_runner.go:114-118: AddCallback only appends when r.Callbacks is nil. Once initialized (e.g., by NewLocalRunner setting a noop callback on line 174), subsequent calls are silently dropped. The watchdog callback (main.go:832) and debug logging callback (main.go:747) are never actually registered. The fix is to remove the if r.Callbacks == nil guard. Fix this →

Bug — Duplicate validation block in Manifest.Validate

manifest.go:323-349: The "validate that the mounts are correct" block is copy-pasted — lines 337-349 are an exact duplicate of lines 323-335. Delete the second copy. Fix this →

API inconsistency — Two ReplaceArgs interfaces

Service.ReplaceArgs (manifest.go:555) takes map[string]string, while the YAML-layer applyReplaceArgs (recipe_yaml.go:491) takes []string of alternating pairs. Two different interfaces for the same concept.

Minor Notes

  • ovmf.sh depends on jq being installed but doesn't check for it — will produce confusing errors if missing.
  • start.sh:25kill -0 $(cat "${PIDFILE}") is unquoted; if the PID file is empty, this will misbehave.
  • Tests are well structured and comprehensively cover lifecycle hooks, validation, and custom recipe format.

@canercidam canercidam marked this pull request as ready for review February 11, 2026 22:41
@claude
Copy link
Contributor

claude bot commented Feb 11, 2026

Claude finished @canercidam's task in 2m 16s —— View job


PR #360 Review

  • Gather context and read changed files
  • Analyze code quality, bugs, and security
  • Post inline comments for specific issues
  • Post overall review summary

Summary

This PR restructures custom recipes to a group/variant/playground.yaml directory convention, internalizes buildernet recipe overrides, adds lifecycle hooks for custom services (init/start/stop), moves WaitForReady and post-hook execution into main.go (correctly sequenced), adds FindService/RemoveService/ReplaceArgs/DependsOnNone methods on Component/Service, adds a validate command, and separates lifecycle hook execution from docker+host service startup.

Several issues from previous reviews have been fixed:

  • Fixed: --subscribe-all-subnets now correctly uses beacon.WithArgs("--subscribe-all-subnets") instead of ReplaceArgs
  • Fixed: Post-hook timing — WaitForReady is called before ExecutePostHookActions in main.go:767-776
  • Fixed: Lifecycle hooks run after WaitForReady and post-hooks in main.go:778-782, so docker services are healthy first
  • Fixed: postHook.Action now receives a context.Context for cancellation propagation

Remaining Issues

Bug — Index out of bounds in applyReplaceArgs (inline comment)

recipe_yaml.go:491-505: When replaceArgs has an odd length, the warning fires but execution continues. At the last odd index, replaceArgs[i+1] panics. Truncate the slice after the warning. This has been flagged in every previous review.

Race condition — lifecycle start command (inline comment)

local_runner_lifecycle.go:112-125: startCmd.Run() is launched in a goroutine before the handle is appended to d.handles. If the process exits quickly, sendExitError fires before the handle is tracked. Use cmd.Start() explicitly, append the handle, then cmd.Wait() in the goroutine. The same pattern exists in local_runner.go:959-980.

Code duplication — removeService vs Component.RemoveService

The standalone removeService function in recipe_yaml.go:331-342 is an exact duplicate of Component.RemoveService in manifest.go:92-102. Line 278 should use component.RemoveService(serviceName) and the standalone function should be deleted. Fix this →

Bug — RunnerConfig.AddCallback silently drops callbacks

local_runner.go:114-118: AddCallback only appends when r.Callbacks is nil. Once initialized (by NewLocalRunner setting a noop on line 174, or by the first call), subsequent calls are silently dropped. The watchdog callback (main.go:832) and debug logging callback (main.go:747) are never added. Remove the if r.Callbacks == nil guard. Fix this →

Duplicate validation block in Manifest.Validate

manifest.go:323-349: The "validate that the mounts are correct" block is copy-pasted — lines 337-349 are an exact duplicate of lines 323-335. Delete the second block. Fix this →

API inconsistency — Two ReplaceArgs interfaces

Service.ReplaceArgs (manifest.go:555) takes map[string]string, while applyReplaceArgs (recipe_yaml.go:491) takes []string of alternating pairs. Two different interfaces for the same concept. Consider unifying.

Minor Notes

  • start.sh:25kill -0 $(cat "${PIDFILE}") is unquoted; if the PID file is empty or corrupt, this will misbehave. Consider quoting: kill -0 "$(cat "${PIDFILE}")".
  • Tests are well structured and comprehensively cover lifecycle hooks, validation, and the custom recipe format.
  • The Fileserver component was moved from L1 recipe to BuilderNet recipe — verify this is intentional since L1 recipe users will no longer get the file server.

@canercidam canercidam changed the title WIP: Unify BuilderNet recipe Unify BuilderNet mkosi recipe Feb 11, 2026
Comment on lines +491 to +501
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}

result := make([]string, len(args))
copy(result, args)

for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (persists from previous reviews): Index out of bounds panic when replaceArgs has an odd length. The warning on line 493 fires but execution continues. At the last odd index, replaceArgs[i+1] panics.

Suggested change
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value, ignoring last unpaired element", "count", len(replaceArgs))
replaceArgs = replaceArgs[:len(replaceArgs)-1]
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]

Comment on lines +111 to +125
startCmd := lc.newCmd(ctx, svc.Start)
go func() {
if err := startCmd.Run(); err != nil {
if mainctx.IsExiting() {
return
}
slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
}
}()

d.handlesMu.Lock()
defer d.handlesMu.Unlock()
d.handles = append(d.handles, startCmd)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition (persists from previous reviews): startCmd.Run() (which internally calls Start() then Wait()) is launched in a goroutine. The handle is appended to d.handles only on line 124, after the goroutine is launched. If the process exits very quickly, sendExitError fires before the handle is tracked, and stopAllProcessesWithSignal could encounter a handle whose Process is still nil.

The same pattern exists in local_runner.go:959-980 for the regular runOnHost path.

Use Start() explicitly, append the handle, then Wait() in the goroutine:

Suggested change
startCmd := lc.newCmd(ctx, svc.Start)
go func() {
if err := startCmd.Run(); err != nil {
if mainctx.IsExiting() {
return
}
slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
}
}()
d.handlesMu.Lock()
defer d.handlesMu.Unlock()
d.handles = append(d.handles, startCmd)
return nil
startCmd := lc.newCmd(ctx, svc.Start)
if err := startCmd.Start(); err != nil {
return fmt.Errorf("%s", lc.formatError("start", svc.Start, err))
}
d.handlesMu.Lock()
d.handles = append(d.handles, startCmd)
d.handlesMu.Unlock()
go func() {
if err := startCmd.Wait(); err != nil {
if mainctx.IsExiting() {
return
}
slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
}
}()
return nil

Comment on lines +491 to +505
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}

result := make([]string, len(args))
copy(result, args)

for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug (persists from previous reviews): Index out of bounds panic when replaceArgs has an odd length. The warning on line 493 fires but execution continues. At the last odd index, replaceArgs[i+1] on line 501 panics.

Suggested change
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value", "count", len(replaceArgs))
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
return result
func applyReplaceArgs(args, replaceArgs []string) []string {
if len(replaceArgs)%2 != 0 {
slog.Warn("replace_args should contain pairs of flag and value, ignoring last unpaired element", "count", len(replaceArgs))
replaceArgs = replaceArgs[:len(replaceArgs)-1]
}
result := make([]string, len(args))
copy(result, args)
for i := 0; i < len(replaceArgs); i += 2 {
flag := replaceArgs[i]
newValue := replaceArgs[i+1]
result = applyReplacePair(flag, newValue, result)
}
return result
}

Comment on lines +112 to +125
go func() {
if err := startCmd.Run(); err != nil {
if mainctx.IsExiting() {
return
}
slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
}
}()

d.handlesMu.Lock()
defer d.handlesMu.Unlock()
d.handles = append(d.handles, startCmd)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition (persists from previous reviews): startCmd.Run() (which internally calls Start() then Wait()) is launched in a goroutine on line 113. The handle is appended to d.handles only on line 124, after the goroutine is launched. If the process exits very quickly, sendExitError fires before the handle is tracked. Additionally, stopAllProcessesWithSignal could encounter a handle whose Process is still nil (the stopProcessWithSignal helper polls for non-nil Process, but this still creates an unnecessary window).

The same pattern exists in local_runner.go:959-980 for the regular runOnHost path.

Consider using Start() explicitly, appending the handle, then Wait() in the goroutine:

if err := startCmd.Start(); err != nil {
    return fmt.Errorf("%s", lc.formatError("start", svc.Start, err))
}

d.handlesMu.Lock()
d.handles = append(d.handles, startCmd)
d.handlesMu.Unlock()

go func() {
    if err := startCmd.Wait(); err != nil {
        if mainctx.IsExiting() {
            return
        }
        slog.Error("Lifecycle service failed", "service", svc.Name, "error", err)
        d.sendExitError(fmt.Errorf("%s", lc.formatError("start", svc.Start, err)))
    }
}()
return nil

Base automatically changed from fryd/mkosi-playground to main February 11, 2026 22:49
@canercidam canercidam changed the base branch from main to fryd/mkosi-playground February 11, 2026 23:22
@canercidam canercidam merged commit c50fee9 into fryd/mkosi-playground Feb 11, 2026
26 checks passed
@canercidam canercidam deleted the caner/buildernet-recipe branch February 11, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant