Long-running container steps "succeed" prematurely when running on AKS

### Checks

- [x] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [x] I am using charts that are officially provided

### Controller Version

0.6.1

### Deployment Method

Helm

### Checks

- [x] This isn't a question or user support case (For Q&A and community support, go to [Discussions](https://github.com/actions/actions-runner-controller/discussions)).
- [x] I've read the [Changelog](https://github.com/actions/actions-runner-controller/blob/master/docs/gha-runner-scale-set-controller/README.md#changelog) before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

### To Reproduce

```markdown
Have a scale set like:


spec:
  githubConfigSecret: gha-runner-scale-set
  githubConfigUrl: xxxxxx
  maxRunners: 20
  minRunners: 2
  runnerGroup: k8s-standard
  runnerScaleSetName: arc-k8s-standard-azure
  template:
    spec:
      containers:
      - command:
        - /home/runner/run.sh
        env:
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"
        - name: ACTIONS_RUNNER_CONTAINER_HOOKS
          value: /home/runner/k8s/index.js
        - name: ACTIONS_RUNNER_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        image: actions/actions-runner:latest
        name: runner
        resources:
          limits:
            memory: 2G
          requests:
            cpu: 1
            memory: 2G
        volumeMounts:
        - mountPath: /home/runner/_work
          name: work
      restartPolicy: Never
      securityContext:
        fsGroup: 123
      serviceAccountName: arc-k8s-standard-azure-gha-rs-kube-mode
      volumes:
      - ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 25Gi
              storageClassName: managed-csi-premium
        name: work


Run a simple workflow that sleeps for 10 minutes and then exits with a 1:


name: "Azure test"
on:
  workflow_dispatch:

Using a runner scale-set like:

`
jobs:
  test:
    name: "Test Long-running Fail job"
    runs-on: arc-k8s-standard-azure
    container:
      image: ubuntu-22.04-0.44.0
    steps:
      - name: "Run Test"
        run: |
          echo "This is a test job that will sleep and then fail after 10 minutes."
          sleep 600
          exit 1


Run the job
```

### Describe the bug

The step runs for just over 5 minutes, then "succeeds" as if the script returned a 0. (Logs even report that the script process exited 0)

```
[WORKER 2025-07-28 15:30:52Z INFO ProcessInvokerWrapper] Finished process 159 with exit code 0, and elapsed time 00:05:02.5167359.

```

Oddly, this only seems to occur if we run in AKS. The same scale set and job configuration, when running in Amazon EKS works as expected (job runs for 10 minutes then fails with exit 1)

### Describe the expected behavior

Job should run for 10 minutes then fail

### Additional Context

```yaml
n/a
```

### Controller Logs

```shell
https://gist.github.com/jharlow1/b83430815bb3c4bd6fe8a2efae25cb93
```

### Runner Pod Logs

```shell
https://gist.github.com/jharlow1/6080b1b26e4e16c3c175ab07c9f2e95c
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Long-running container steps "succeed" prematurely when running on AKS #4190

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Long-running container steps "succeed" prematurely when running on AKS #4190

Description

Checks

Controller Version

Deployment Method

Checks

To Reproduce

Describe the bug

Describe the expected behavior

Additional Context

Controller Logs

Runner Pod Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions