Releases: buildkite/agent-stack-k8s
v0.37.1
v0.37.1 (2026-01-29)
Fixed
- PS-1583: Replace Alpine-specific shell commands with portable alternatives #811 (@zhming0, @seemethere)
- Bump github.com/buildkite/agent/v3 from 3.115.3 to 3.116.0
- Fixed #806
Changed
Internal
- chore(deps): Bump sigs.k8s.io/controller-runtime from 0.22.4 to 0.23.0 in the k8s group #810 (@dependabot[bot])
- PB-1062: dogfooding resource class #804 (@zhming0)
- Quiet down logs by default #803 (@moskyb)
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.37.1
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.37.1
Digest: sha256:18a4a3773f6cbf6ef6103140cd4c754843a555dbaf1ab611639fc46ce2027b06
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.37.1
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.37.1
Digest: sha256:2f2815e1f15d8daa55a0dd595ef6148431e0c986cdb8cb74e7cd4b06e74e9114
Agent
Image: ghcr.io/buildkite/agent:3.116.0
Digest: sha256:6cfb4f832fc071bf647d57b54dfb93d57ea5c3ba3786739bb998100d63197c61
v0.37.0
v0.37.0 (2026-01-15)
Warning
Internal: Viper → Kong Migration
In this release, we replaced Viper/Cobra with Kong for CLI parsing and configuration management. This change fixes a long-standing bug where Viper lowercased keys in nested config structures (e.g., volumeAttributes.secretProviderClass became secretproviderclass), which broke CSI volume configurations and other case-sensitive Kubernetes specs.
All existing CLI flags, environment variables, and config file formats remain backward compatible.
If you encounter any unexpected behavior with configuration parsing, please open an issue (https://github.com/buildkite/agent-stack-k8s/issues/new).
New 🎉
- Add default-resource-class-name config option #798 (Thanks @gempesaw)
Fixed
- PS-1547: Restore completionsWatcher for pod cleanup #802 (@zhming0)
This fixes cleanup of unmanaged containers added viapodSpecPatchthat would otherwise run indefinitely after the agent terminates. It also addresses an issue when upgrading from v0.34.0 to v0.35.0+, where in-flight jobs with legacy sidecars might not be properly cleaned up. - PS-1530: Viper -> Kong, rework controller entrypoint #799 (@zhming0)
Internal
- chore(deps): Bump github.com/buildkite/agent/v3 from 3.115.2 to 3.115.3 #801 (@dependabot[bot])
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.37.0
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.37.0
Digest: sha256:04d88191a728135b4893c3ec1e10e54eab5d2ad4cdc222337b96d4336602c162
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.37.0
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.37.0
Digest: sha256:d37cc249a154ead68078be68de6b1afa1ca3ad062aacffb8becfef1ab9c9e89c
Agent
Image: ghcr.io/buildkite/agent:3.115.3
Digest: sha256:6ad3702da271bd8e31b5905b7b7f1afc83bf6d5302ea4a47c4d1c10b7b803b12
v0.36.3
v0.36.3 (2026-01-15)
Fixed
- PS-1547: Restore completionsWatcher for pod cleanup #802 (@zhming0)
This fixes cleanup of unmanaged containers added viapodSpecPatchthat would otherwise run indefinitely after the agent terminates. It also addresses an issue when upgrading from v0.34.0 to v0.35.0+, where in-flight jobs with legacy sidecars might not be properly cleaned up.
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.36.3
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.36.3
Digest: sha256:12658343d0024d6651a0b0a462c4b9b4ee843655cc3204b26e3293cd75c1f04b
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.36.3
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.36.3
Digest: sha256:2bc28273071f5eb9bdf14ea158fee8c9f631f1a12a7a609df15ec18e8490616c
Agent
Image: ghcr.io/buildkite/agent:3.115.2
Digest: sha256:aa3f647dc4abe889281cbe492b6a2778f0300ad17d0f513983122649cc965be0
v0.36.2
v0.36.2 (2025-12-31)
Warning
If you are upgrading from a version prior to v0.35.0, in-flight jobs with legacy sidecars may not be properly cleaned up, causing K8s Jobs to remain stuck. This is fixed in v0.36.3+. We recommend upgrading directly to v0.36.3 or later.
Fixed
- Fix an issue where helm charts couldn't be deployed without
--skip-schema-validation#795 (@JoeColeman95)
Changed
- Pin images, go tool gotestsum #785 (@DrJosh9000)
- Dependency updates #793 #794 #789 #788 #790 #786 #782 #780 #784 (@dependabot)
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.36.2
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.36.2
Digest: sha256:9d9324d12897e3dd5381ddc85c69b377719bc24294640b2bd4902a69234090a3
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.36.2
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.36.2
Digest: sha256:b13244a655431e844f130f16fcfc9703d1670d6dbf2dc2641da907f5c0ae9bfe
Agent
Image: ghcr.io/buildkite/agent:3.115.2
Digest: sha256:aa3f647dc4abe889281cbe492b6a2778f0300ad17d0f513983122649cc965be0
v0.36.1
v0.36.1 (2025-12-11)
Warning
If you are upgrading from a version prior to v0.35.0, in-flight jobs with legacy sidecars may not be properly cleaned up, causing K8s Jobs to remain stuck. This is fixed in v0.36.3+. We recommend upgrading directly to v0.36.3 or later.
Fixed
- Fix a bug where, in rare cases, agents in a pod would retry errors for up to 20 minutes before exiting. #3628 (@zhming0)
Internal
- chore(deps): Bump github.com/buildkite/agent/v3 from 3.114.0 to 3.115.1 #783 (@dependabot[bot])
Agent changelog
v3.115.1 (2025-12-12)
Fixes
Internal
- PB-1023: remove old kubernetes bootstrap setup #3629 (@zhming0)
- chore(deps): update zstash to v0.6.0 and update progress callback #3630 (@wolfeidau)
- feat: add support for concurrent save and restore operations #3627 (@wolfeidau)
v3.115.0 (2025-12-10)
Added
--changed-files-pathfor pipeline upload, which allows users to specify a list of files changed forif_changedcomputation #3620 (@pyrocat101)
Fixes
Internal
v3.114.1 (2025-12-05)
Fixed
- Fix issue where artifacts uploaded to customer-managed s3 buckets could not be downloaded #3607 (@moskyb)
Internal
- Add an end-to-end testing framework! #3611 #3610 #3609 #3608 #3606 #3604 #3599 (@DrJosh9000)
- Dependency updates #3601 #3600 (@dependabot[bot])
- Update MIME types #3603 (@DrJosh9000)
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.36.1
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.36.1
Digest: sha256:a78f377eb01c8c9790edc8656e2a1d8b9e25274763f5a3cfbb6fe3a1c35b9d9f
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.36.1
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.36.1
Digest: sha256:cbc4a690308612a19bfd1bd604d208968eb4c93e1388beb45f3ee8f0b9505983
Agent
Image: ghcr.io/buildkite/agent:3.115.1
Digest: sha256:d7eb24041e158196721622d036a07b24bf1fc1b2e4073e7ac7ac188419a2d54d
v0.36.0
v0.36.0 (2025-12-05)
(Note: this release was meant to be a patch release)
Warning
If you are upgrading from a version prior to v0.35.0, in-flight jobs with legacy sidecars may not be properly cleaned up, causing K8s Jobs to remain stuck. This is fixed in v0.36.3+. We recommend upgrading directly to v0.36.3 or later.
Fixes
- Add queue info to agentTags in controller #779 (@JoeColeman95)
Internal
- Some cleanups #778 (@DrJosh9000)
- PB-946 part 2: backfill test for stack notification batcher #776 (@zhming0)
- Upgrade go to 1.25.4 #777 (@zhming0)
- PB-946 part 1: backfill unit test for reserver #771 (@zhming0)
- chore(deps): Bump github.com/buildkite/agent/v3 from 3.112.0 to 3.114.0 #775 (@dependabot[bot])
- chore(deps): Bump github.com/jedib0t/go-pretty/v6 from 6.7.1 to 6.7.5 #774 (@dependabot[bot])
- Remove old job checker #772 (@zhming0)
- chore(deps): Bump the k8s group with 3 updates #768 (@dependabot[bot])
Agent changelog
v3.114.0 (2025-11-25)
Added
- feat: add agent metadata to OTEL trace attributes #3587 (@pyrocat101)
Fixed
- Fix for the agent sometimes failing to disconnect properly when exiting - agent pool: Send error after disconnecting #3596 (@DrJosh9000)
Internal
- internal/redact: Add another test with minor cleanup #3591 (@DrJosh9000)
- Run gofumpt as part of CI #3589 (@moskyb)
Dependency updates
- build(deps): bump the cloud-providers group with 7 updates #3593 (@dependabot[bot])
- build(deps): bump the container-images group across 5 directories with 1 update #3594 (@dependabot[bot])
- build(deps): bump the container-images group across 1 directory with 2 updates #3595 (@dependabot[bot])
- build(deps): bump golang.org/x/crypto from 0.44.0 to 0.45.0 #3590 (@dependabot[bot])
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.36.0
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.36.0
Digest: sha256:68c86f205e83b33a602733c3400d3b0a81ef32fd177a7c57bc334428ac620724
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.36.0
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.36.0
Digest: sha256:5169862ac3f86f463cb4c173cdc248ca7a50b597ccd2535647873a03f34520c7
Agent
Image: ghcr.io/buildkite/agent:3.114.0
Digest: sha256:cb1972b3e9f8a34bb8b77884d2519955ef3229a2c6c05b5b1cf74eca2a5510d3
v0.35.0
v0.35.0 (2025-11-13)
Warning
Breaking changes in v0.35
Kubernetes Version Requirement
Starting with v0.35.0 we now use k8s's native sidecar container mechanism to power our sidecar feature. This requires Kubernetes 1.29 or later.
Sidecar Behavior Change
With the native sidecar mechanism, sidecars are now expected to be long-running processes rather than ad-hoc tasks. If your sidecars are designed to run and exit (similar to init containers), you'll need to refactor them to run continuously or consider using init containers instead.
Warning
If you are upgrading from a version prior to v0.35.0, in-flight jobs with legacy sidecars may not be properly cleaned up, causing K8s Jobs to remain stuck. This is fixed in v0.36.3+. We recommend upgrading directly to v0.36.3 or later.
Added
- PB-862: use Kubernetes native sidecar mechanism #759 (@zhming0)
- Add explicit queue config option #762 (@moskyb)
Fixed
- Mount signing JWKS volume to agent container #765 (@petetomasik)
Internal
- chore(deps): Bump github.com/buildkite/agent/v3 from 3.111.0 to 3.112.0 #769 (@dependabot[bot])
- chore(deps): Bump the k8s group with 4 updates #680 (@dependabot[bot])
- chore(deps): Bump github.com/spf13/cobra from 1.9.1 to 1.10.1 #684 (@dependabot[bot])
- chore(deps): Bump github.com/jedib0t/go-pretty/v6 from 6.6.8 to 6.7.1 #767 (@dependabot[bot])
- chore(deps): Bump github.com/spf13/viper from 1.20.1 to 1.21.0 #691 (@dependabot[bot])
- chore(deps): Bump github.com/buildkite/agent/v3 from 3.110.0 to 3.111.0 #766 (@dependabot[bot])
- chore(deps): Bump github.com/prometheus/client_golang from 1.23.0 to 1.23.2 #685 (@dependabot[bot])
- Fix kubeVersion to include pre-release #764 (@zhming0)
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.35.0
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.35.0
Digest: sha256:0e7e6bfcee223d65631bd08a2587f1bb4fd431c36504ce429e19f8e3294fc114
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.35.0
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.35.0
Digest: sha256:de8ab3cce237ccbefd016ec3d162e330cd4a266f2447a43c53ae17c85c5b3360
Agent
Image: ghcr.io/buildkite/agent:3.112.0
Digest: sha256:e74668ee67923f109b57ebf3f3b2ba2e8ff6cbdcc5df3fe0387f47086e5d9d27
v0.34.0
v0.34.0 (2025-10-29)
Added
New settings around logs: #756 #761 (@moskyb)
- You can now set the format of logs using the
log-formatconfig option, which supportslogfmt(the default), andjson - You can now set log level using the
log-levelconfig option - The default log format has changed a little in this version as a result of switching logging libraries
Fixed
- A bug where if there were more than 1000 jobs to reserve, reservations would fail #760 (@moskyb)
- When setting env vars, update existing rather than appending #758 (@DrJosh9000)
Internal
- Modernise the
buildkite.yamlfor this repo to make use of the new features we've been shipping #754 #755 (@zhming0) - Use the standalone stacksapi module #753 (@moskyb)
- Run release after pushing helm chart #752 (@DrJosh9000)
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.34.0
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.34.0
Digest: sha256:c2eecce9fb6e4f0a4fa237cb40a2e878770abca646c2eb16b12f8002f012ecc5
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.34.0
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.34.0
Digest: sha256:1565e27f5dcba25acb6b747669138ce620f27a6f5252d6d041ad927fe000b0f5
Agent
Image: ghcr.io/buildkite/agent:3.110.0
Digest: sha256:a5e20b66d3aa10277b590137bf6f91b5025bdd1696778844e2c6c60bd89db7af
v0.33.1
v0.33.1 (2025-10-22)
This release primarily fixes the agent behaviour when a job's pod is deleted, allowing more time for post-command hooks, post-exit hooks, and artifact uploads to complete with logs uploaded and the job correctly marked as finished. The grace period is based on terminationGracePeriodSeconds.
Changed
- chore(deps): Bump github.com/buildkite/agent/v3 from 3.109.1 to 3.110.0 #750 (@dependabot[bot])
- Set {signal,cancel} grace period agent config #748 (@DrJosh9000)
Agent changelog
v3.110.0 (2025-10-22)
Added
- Configurable chunks interval #3521 (@catkins)
- Inject OpenTelemetry context to all child processes #3548 (@zhming0)
- This is done using environment variables. This may interfere with existing OTel environment variables if they are manually added some other way.
- Add --literal and --delimiter flags to artifact upload #3543 (@DrJosh9000)
Changed
Various improvements and fixes to do with signal and cancel grace periods, and signal handling, most notably:
- When cancelling a job, the timeout before sending a SIGKILL to the job has changed from cancel-grace-period to signal-grace-period (
--signal-grace-period-secondsflag,BUILDKITE_SIGNAL_GRACE_PERIOD_SECONDSenv var) to allow the agent some extra time to upload job logs and mark the job as finished. By default, signal-grace-period is 1 second shorter than cancel-grace-period. You may wish to increase cancel-grace-period accordingly. - When SIGQUIT is handled by the bootstrap, the exit code is now 131, and it no longer dumps a stacktrace.
- The recently-added
--kubernetes-log-collection-grace-periodflag is now deprecated. Instead, use--cancel-grace-period. - When running the agent interactively, you can now Ctrl-C a third time to exit immediately.
- In Kubernetes mode, the agent now begins shutting down on the first SIGTERM. The kubernetes-bootstrap now swallows SIGTERM with a logged message, and waits for the agent container to send an interrupt.
- When the agent is cancelling jobs because it is stopping, all jobs start cancellation simultaneously. This allows the agent to exit sooner when multiple workers (
--spawnflag) are used.
See #3549, #3547, #3534 (@DrJosh9000)
Fixed
- Refresh checkout root file handle after checkout hook #3546 (@zhming0)
- Bump zzglob to v0.4.2 to fix uploading artifact paths containing
~#3539 (@DrJosh9000)
Internal
- Docs: Add examples for step update commands for priority and notify attributes #3532 (@tomowatt)
- Docs: Update URLs in agent cfg comments #3536 (@petetomasik)
Dependency updates
- Upgrade Datadog-go to v5.8.1 to work around mod checksum issues #3538 (@dannyfallon)
- build(deps): bump the container-images group across 3 directories with 2 updates #3545 (@dependabot[bot])
- build(deps): bump gopkg.in/DataDog/dd-trace-go.v1 from 1.74.6 to 1.74.7 #3544 (@dependabot[bot])
- build(deps): bump github.com/gofrs/flock from 0.12.1 to 0.13.0 #3523 (@dependabot[bot])
- build(deps): bump docker/library/golang from 1.24.8 to 1.24.9 in /.buildkite in the container-images group across 1 directory #3542 (@dependabot[bot])
- build(deps): bump the cloud-providers group across 1 directory with 6 updates #3541 (@dependabot[bot])
- build(deps): bump the container-images group across 3 directories with 1 update #3540 (@dependabot[bot])
- build(deps): bump the golang-x group with 5 updates #3525 (@dependabot[bot])
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.33.1
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.33.1
Digest: sha256:0b04e99b40079a30d4b71245d726d74ff64260965b38a2f1217050d28c8cace0
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.33.1
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.33.1
Digest: sha256:556758dd95c85437fd3b720a2557ee4f8ae1eb0efdea72c48e6a75f117355311
Agent
Image: ghcr.io/buildkite/agent:3.110.0
Digest: sha256:a5e20b66d3aa10277b590137bf6f91b5025bdd1696778844e2c6c60bd89db7af
v0.32.5
v0.32.5 (2025-10-22)
This release primarily fixes the agent behaviour when a job's pod is deleted, allowing more time for post-command hooks, post-exit hooks, and artifact uploads to complete with logs uploaded and the job correctly marked as finished. The grace period is based on terminationGracePeriodSeconds.
Changed
- Bump buildkite-agent to v3.110.0 #751 (@DrJosh9000)
- Set {signal,cancel} grace period agent config to match
terminationGracePeriodSeconds#749 (@DrJosh9000)
Internal
Agent changelog
v3.110.0 (2025-10-22)
Added
- Configurable chunks interval #3521 (@catkins)
- Inject OpenTelemetry context to all child processes #3548 (@zhming0)
- This is done using environment variables. This may interfere with existing OTel environment variables if they are manually added some other way.
- Add --literal and --delimiter flags to artifact upload #3543 (@DrJosh9000)
Changed
Various improvements and fixes to do with signal and cancel grace periods, and signal handling, most notably:
- When cancelling a job, the timeout before sending a SIGKILL to the job has changed from cancel-grace-period to signal-grace-period (
--signal-grace-period-secondsflag,BUILDKITE_SIGNAL_GRACE_PERIOD_SECONDSenv var) to allow the agent some extra time to upload job logs and mark the job as finished. By default, signal-grace-period is 1 second shorter than cancel-grace-period. You may wish to increase cancel-grace-period accordingly. - When SIGQUIT is handled by the bootstrap, the exit code is now 131, and it no longer dumps a stacktrace.
- The recently-added
--kubernetes-log-collection-grace-periodflag is now deprecated. Instead, use--cancel-grace-period. - When running the agent interactively, you can now Ctrl-C a third time to exit immediately.
- In Kubernetes mode, the agent now begins shutting down on the first SIGTERM. The kubernetes-bootstrap now swallows SIGTERM with a logged message, and waits for the agent container to send an interrupt.
- When the agent is cancelling jobs because it is stopping, all jobs start cancellation simultaneously. This allows the agent to exit sooner when multiple workers (
--spawnflag) are used.
See #3549, #3547, #3534 (@DrJosh9000)
Fixed
- Refresh checkout root file handle after checkout hook #3546 (@zhming0)
- Bump zzglob to v0.4.2 to fix uploading artifact paths containing
~#3539 (@DrJosh9000)
Internal
- Docs: Add examples for step update commands for priority and notify attributes #3532 (@tomowatt)
- Docs: Update URLs in agent cfg comments #3536 (@petetomasik)
Dependency updates
- Upgrade Datadog-go to v5.8.1 to work around mod checksum issues #3538 (@dannyfallon)
- build(deps): bump the container-images group across 3 directories with 2 updates #3545 (@dependabot[bot])
- build(deps): bump gopkg.in/DataDog/dd-trace-go.v1 from 1.74.6 to 1.74.7 #3544 (@dependabot[bot])
- build(deps): bump github.com/gofrs/flock from 0.12.1 to 0.13.0 #3523 (@dependabot[bot])
- build(deps): bump docker/library/golang from 1.24.8 to 1.24.9 in /.buildkite in the container-images group across 1 directory #3542 (@dependabot[bot])
- build(deps): bump the cloud-providers group across 1 directory with 6 updates #3541 (@dependabot[bot])
- build(deps): bump the container-images group across 3 directories with 1 update #3540 (@dependabot[bot])
- build(deps): bump the golang-x group with 5 updates #3525 (@dependabot[bot])
Images
Helm chart
Image: public.ecr.aws/buildkite/helm/agent-stack-k8s:0.32.5
Image: ghcr.io/buildkite/helm/agent-stack-k8s:0.32.5
Digest: sha256:f8572c5fee827025c8e63dbda6a0451a8024cab67420310be76acd7c6bef786a
Controller
Image: public.ecr.aws/buildkite/agent-stack-k8s/controller:0.32.5
Image: ghcr.io/buildkite/agent-stack-k8s/controller:0.32.5
Digest: sha256:7402184570454b1bc91ae41692826a402d17a0a1fda510e2bb688f449869d639
Agent
Image: ghcr.io/buildkite/agent:3.110.0
Digest: sha256:a5e20b66d3aa10277b590137bf6f91b5025bdd1696778844e2c6c60bd89db7af