-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.12.1
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
Project: Actions-Runner-Controller (not Summerwind)
Version: 0.12.1
Deployment Method: Helm
Kubernetes Version: v1.32.5
---
1. Deploy Actions-Runner-Controller version 0.11.0 via Helm on a Kubernetes cluster (v1.32.5) with GitHub Enterprise integration.
2. Verify that runners operate correctly under normal load.
3. Upgrade to version 0.12.1 by fully removing all ARC-related resources, including CustomResourceDefinitions (CRDs), and perform a clean installation using Helm.
4. Reconfigure and deploy runners as before.
5. Execute various GitHub Actions workflows across multiple repositories.
6. After some time, observe that:
• Certain jobs appear completed or failed on GitHub Enterprise.
• Some runner pods remain active indefinitely and do not exit.
• Logs within those pods show repeated registration failures with messages like:
"Registration was not found or is not medium trusted."
• The issue affects different runners at different times with no identifiable pattern (i.e., across various repos and workflows).
7. As a result, the runner pool becomes blocked, and new jobs are not executed until affected pods are manually terminated.
Describe the bug
Hello team,
After upgrading to ARC version 0.11.0, we noticed that some runners enter a state where they run indefinitely and block new jobs from being picked up. Inside the runner containers, we observed registration failures due to expired tokens.
On the GitHub Enterprise side, those jobs appear to have already completed or failed, but the corresponding runners keep running. It seems that the listener is unable to properly clean up the runner after a job finishes and continuously attempts to re-register it with GitHub.
We were hoping this issue would be resolved in version 0.12.1, but unfortunately, it still persists. In one instance, a pod even ended up in an evicted state.
As a temporary workaround to prevent the job queue from stalling, we’ve implemented a cron job that monitors runner logs and forcefully terminates any pod where the log contains:
"Registration was not found or is not medium trusted."
This helps keep the runners processing jobs but doesn’t address the root cause.
Is this a known issue, and do you have any recommendations or a potential fix?
Describe the expected behavior
Runners should terminate properly after job completion or failure. They should not attempt to re-register if the job has already ended and the registration token has expired. Additionally, the controller should ensure that expired or stuck runners are cleaned up automatically to avoid blocking the job queue.
Additional Context
githubConfigUrl: "https://github.enterprise.example.com/enterprises/***"
githubConfigSecret: "github-token"
proxy:
http:
url: http://**********
https:
url: http://**********
noProxy:
- localhost
- 127.0.0.1
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
maxRunners: 5
minRunners: 1
runnerGroup: "enterprise-gpr-m-02"
runnerScaleSetName: "enterprise-gpr-m"
labels:
group: enterprise-runners
githubServerTLS:
certificateFrom:
configMapKeyRef:
name: ca
key: ca.crt
runnerMountPath: /usr/local/share/ca-certificates/
template:
spec:
initContainers:
- name: init-dind-externals
image: actions/actions-runner/full:2.322.1
imagePullPolicy: Always
command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
volumeMounts:
- name: dind-externals
mountPath: /home/runner/tmpDir
resources:
requests:
cpu: "50m"
memory: "200Mi"
limits:
memory: "250Mi"
- name: init-dind-rootless
image: docker:27.3.1-dind-rootless
imagePullPolicy: IfNotPresent
command:
- sh
- -c
- |
set -x
cp -a /etc/. /dind-etc/
echo 'runner:x:1001:1001:runner:/home/runner:/bin/ash' >> /dind-etc/passwd
echo 'runner:x:1001:' >> /dind-etc/group
echo 'runner:100000:65536' >> /dind-etc/subgid
echo 'runner:100000:65536' >> /dind-etc/subuid
chmod 755 /dind-etc;
chmod u=rwx,g=rx+s,o=rx /dind-home
chown 1001:1001 /dind-home
mkdir -p /var/lib/docker
chmod u=rwx,g=rx+s,o=rx /var/lib/docker
chown -R 1001:1001 /var/lib/docker
securityContext:
runAsUser: 0
volumeMounts:
- mountPath: /dind-etc
name: dind-etc
- mountPath: /dind-home
name: dind-home
- name: docker-data-root
mountPath: /var/lib/docker
resources:
requests:
cpu: "50m"
memory: "200Mi"
limits:
memory: "250Mi"
- name: init-qemu-registrar
image: tonistiigi/binfmt:latest
command: [ "/usr/bin/binfmt", "--install", "all" ]
imagePullPolicy: Always
securityContext:
runAsUser: 0
privileged: true
resources:
requests:
cpu: "25m"
memory: "50Mi"
limits:
memory: "100Mi"
containers:
- name: runner
image: actions/actions-runner/full:2.322.1
imagePullPolicy: Always
command: ["/home/runner/run.sh"]
env:
- name: DOCKER_HOST
value: unix:///run/user/1001/docker.sock
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- mountPath: /tmp
name: tmpdir
- name: sysfs
mountPath: /sys
readOnly: false
resources:
requests:
cpu: "100m"
memory: "500Mi"
limits:
memory: "500Mi"
securityContext:
capabilities:
add:
- SYS_ADMIN
- SYS_PTRACE
- DAC_OVERRIDE
- FOWNER
- CHOWN
- SETUID
- SETGID
runAsUser: 1001
runAsGroup: 1001
privileged: false
- name: dind
image: docker:27.3.1-dind-rootless
imagePullPolicy: IfNotPresent
args:
- dockerd
- --config-file=/etc/docker/daemon.json
securityContext:
privileged: true
runAsUser: 1001
runAsGroup: 1001
capabilities:
add:
- SYS_ADMIN
- MKNOD
- CHOWN
- SETUID
- SETGID
resources:
requests:
cpu: "200m"
memory: "650Mi"
limits:
memory: "3346Mi"
volumeMounts:
- name: work
mountPath: /home/runner/_work
- name: dind-sock
mountPath: /var/run
- name: dind-externals
mountPath: /home/runner/externals
- name: dind-etc
mountPath: /etc
- name: dind-home
mountPath: /home/runner
- name: docker-data-root
mountPath: /var/lib/docker
- name: sysfs
mountPath: /sys
readOnly: false
volumes:
- name: work
emptyDir: {}
- name: dind-externals
emptyDir: {}
- name: dind-sock
emptyDir: {}
- name: dind-etc
emptyDir: {}
- name: dind-home
emptyDir: {}
- name: tmpdir
emptyDir: {}
- name: docker-data-root
emptyDir: {}
- name: sysfs
hostPath:
path: /sys
type: Directory
Controller Logs
-
Runner Pod Logs
√ Connected to GitHub
[RUNNER 2025-07-16 07:45:34Z INFO Terminal] WRITE LINE:
[RUNNER 2025-07-16 07:45:34Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-07-16 07:45:35Z ERR GitHubActionsService] POST request to https://github.enterprise.example.com/_services/vstoken/_apis/oauth2/token/eb530d92-6032-4cac-8ece-acf7fa59845f failed. HTTP Status: BadRequest
[RUNNER 2025-07-16 07:45:35Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] Catch exception during create session.
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] GitHub.Services.OAuth.VssOAuthTokenRequestException: Registration was not found or is not medium trust. ClientType:
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at GitHub.Services.OAuth.VssOAuthTokenProvider.OnGetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at GitHub.Services.Common.IssuedTokenProvider.GetTokenOperation.GetTokenAsync(VssTraceActivity traceActivity)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at GitHub.Services.Common.IssuedTokenProvider.GetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener[] at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener[] at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token)
[RUNNER 2025-07-16 07:45:35Z ERR MessageListener] Test oauth app registration.
[RUNNER 2025-07-16 07:45:35Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/runner/.credentials_rsaparams
[RUNNER 2025-07-16 07:45:35Z ERR GitHubActionsService] POST request to https://github.enterprise.example.com/_services/vstoken/_apis/oauth2/token/eb530d92-6032-4cac-8ece-acf7fa59845f failed. HTTP Status: BadRequest
[RUNNER 2025-07-16 07:45:35Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[RUNNER 2025-07-16 07:45:35Z INFO MessageListener] Retriable exception: Registration was not found or is not medium trust. ClientType:
[RUNNER 2025-07-16 07:45:35Z INFO MessageListener] Sleeping for 30 seconds before retrying.