-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Describe the bug
agent-stack-k8s Alpine Hardcoding Issue
Version: v0.37.0 (current latest)
Location: internal/controller/scheduler/scheduler.go
Problem
The controller hardcodes Alpine-specific shell and commands, making it impossible to use Ubuntu/Debian-based agent images:
- Shell hardcoded to ash (Alpine shell):
// Line 987 - copy-agent init container
Command: []string{"ash"}
Args: []string{"-cefx", containerArgs.String()}
// Lines 1111, 1129, 1142 - checkout container
Command: []string{"ash", "-c"}- Alpine-specific user/group commands:
// Lines 1112-1115
checkoutContainer.Args = []string{fmt.Sprintf(`set -exufo pipefail
addgroup -g %d buildkite-agent
adduser -D -u %d -G buildkite-agent -h /workspace buildkite-agent
su buildkite-agent -c "%s && buildkite-agent-entrypoint kubernetes-bootstrap"`,- addgroup / adduser -D are BusyBox/Alpine commands
- Ubuntu/Debian use groupadd / useradd
Why This Matters
Some Kubernetes environments require the use-vc resolv.conf option to force TCP-based DNS queries. musl libc (Alpine) doesn't support use-vc, causing DNS resolution to fail. glibc-based images (Ubuntu, Rocky) work correctly. In general I feel as though it'd be good for all images published by buildkite/agent to be compatible with this stack
Requested Enhancement
Add configuration option to specify shell and use POSIX-compatible user creation, or detect the image type and adapt accordingly. Example:
config:
shell: "/bin/bash" # or auto-detect
Or use POSIX-compatible approach that works on both:
# Instead of Alpine-specific adduser/addgroup
getent group buildkite-agent || groupadd -g $GID buildkite-agent
getent passwd buildkite-agent || useradd -u $UID -g buildkite-agent -d /workspace buildkite-agent
To Reproduce
Steps to reproduce the behavior:
- Deploy with configuration '...':
# Helm values for agent-stack-k8s
config:
# Custom agent image (Ubuntu-based instead of default Alpine)
image: "ghcr.io/buildkite/agent:3.115.4-ubuntu-24.04"
# Required for our environment - forces TCP DNS queries
pod-spec-patch:
dnsPolicy: "None"
dnsConfig:
options:
- name: use-vc # Force TCP for DNS (not supported by musl/Alpine)- Run pipeline on agents
- See error
Expected behavior
In general I feel as though it'd be good for all images published by buildkite/agent to be compatible with this stack
Environment
- agent-stack-k8s version: v0.37.0
- Kubernetes version: v1.34.2
- Deployment method: modified helm chart
Logs
The following init containers failed:
�[96;100m CONTAINER �[0m�[96;100m EXIT CODE �[0m�[96;100m SIGNAL �[0m�[96;100m REASON �[0m�[96;100m MESSAGE �[0m
�[97;40m copy-agent �[0m�[97;40m 128 �[0m�[97;40m 0 �[0m�[97;40m StartError �[0m�[97;40m failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "ash": executable file not found in $PATH �[0m
Additional context
Add any other context about the problem here.