-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
bugSomething isn't workingSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainersRequires review from the maintainers
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.6.1
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
Have a scale set like:
spec:
githubConfigSecret: gha-runner-scale-set
githubConfigUrl: xxxxxx
maxRunners: 20
minRunners: 2
runnerGroup: k8s-standard
runnerScaleSetName: arc-k8s-standard-azure
template:
spec:
containers:
- command:
- /home/runner/run.sh
env:
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "false"
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
image: actions/actions-runner:latest
name: runner
resources:
limits:
memory: 2G
requests:
cpu: 1
memory: 2G
volumeMounts:
- mountPath: /home/runner/_work
name: work
restartPolicy: Never
securityContext:
fsGroup: 123
serviceAccountName: arc-k8s-standard-azure-gha-rs-kube-mode
volumes:
- ephemeral:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 25Gi
storageClassName: managed-csi-premium
name: work
Run a simple workflow that sleeps for 10 minutes and then exits with a 1:
name: "Azure test"
on:
workflow_dispatch:
Using a runner scale-set like:
`
jobs:
test:
name: "Test Long-running Fail job"
runs-on: arc-k8s-standard-azure
container:
image: ubuntu-22.04-0.44.0
steps:
- name: "Run Test"
run: |
echo "This is a test job that will sleep and then fail after 10 minutes."
sleep 600
exit 1
Run the job
Describe the bug
The step runs for just over 5 minutes, then "succeeds" as if the script returned a 0. (Logs even report that the script process exited 0)
[WORKER 2025-07-28 15:30:52Z INFO ProcessInvokerWrapper] Finished process 159 with exit code 0, and elapsed time 00:05:02.5167359.
Oddly, this only seems to occur if we run in AKS. The same scale set and job configuration, when running in Amazon EKS works as expected (job runs for 10 minutes then fails with exit 1)
Describe the expected behavior
Job should run for 10 minutes then fail
Additional Context
n/a
Controller Logs
https://gist.github.com/jharlow1/b83430815bb3c4bd6fe8a2efae25cb93
Runner Pod Logs
https://gist.github.com/jharlow1/6080b1b26e4e16c3c175ab07c9f2e95c
cborosonvertex
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainersRequires review from the maintainers