Skip to content

investigation(Job/gokore-runner-9zx7q-runner-6cc6m-step-9c2d569c): false positive - no infrastructure issue#1600

Open
k8s-mendabot[bot] wants to merge 1 commit intomainfrom
fix/mechanic-5670d2926db6
Open

investigation(Job/gokore-runner-9zx7q-runner-6cc6m-step-9c2d569c): false positive - no infrastructure issue#1600
k8s-mendabot[bot] wants to merge 1 commit intomainfrom
fix/mechanic-5670d2926db6

Conversation

@k8s-mendabot
Copy link
Copy Markdown

@k8s-mendabot k8s-mendabot Bot commented Apr 13, 2026

Summary

This PR documents an investigation of a failed Job detected by mechanic. After thorough analysis, this finding is a false positive - there is no infrastructure or GitOps configuration issue to fix. The Job is a GitHub Actions workflow step that correctly reported security vulnerabilities found by gosec.

Finding

  • Kind: Job
  • Resource: gokore-runner-9zx7q-runner-6cc6m-step-9c2d569c
  • Namespace: actions-runner-system
  • Parent: Job/gokore-runner-9zx7q-runner-6cc6m-step-9c2d569c
  • Fingerprint: 5670d2926db6

Evidence

Job Details

  • Image: securego/gosec:2.22.3
  • Command: -fmt sarif -out gosec-results.sarif ./...
  • Exit Code: 1
  • BackoffLimit: 0
  • TTLSecondsAfterFinished: 300

Pod State

  • Pod gokore-runner-9zx7q-runner-6cc6m-step-9c2d569c-jwghc terminated with exit code 1
  • Container ran for 7 minutes (21:15:11 to 21:22:12)
  • PVC gokore-runner-9zx7q-runner-6cc6m-work was created (25Gi) and is now in Terminating state
  • No scheduling failures or infrastructure errors observed

Infrastructure Health

  • Actions Runner Controller pod running (7d uptime)
  • gokore-runner listener pod running (7d uptime)
  • Multiple other runner pods running successfully
  • HelmRelease gokore-runner is functioning correctly

GitOps Configuration

The gokore runner is configured in /workspace/repo/kubernetes/apps/actions-runner-system/actions-runner-controller/runners/gokore/helmrelease.yaml:

  • Uses GitHub Actions Runner Scale Set Controller
  • minRunners: 0, maxRunners: 15
  • Container mode: kubernetes
  • Storage: 25Gi PVC with openebs-hostpath StorageClass
  • Runner pod scheduled on worker nodes

Root Cause

This Job is not a Kubernetes infrastructure component. It is a GitHub Actions workflow step that was dynamically created by the Actions Runner Controller to run a gosec security scan on the goKore repository.

The Job "failed" because gosec correctly identified security vulnerabilities in the codebase and exited with code 1. This is expected and correct behavior for a security scanner:

  • Exit code 0: No vulnerabilities found
  • Exit code 1: Vulnerabilities found (job marked as failed in CI/CD)

The backoffLimit: 0 is set by the GitHub Actions workflow definition (in the goKore repository), not by the GitOps configuration. This is appropriate for security scans - you don't want to automatically retry when vulnerabilities are found.

Fix

No fix required in the GitOps repository.

The infrastructure is working correctly:

  1. The Actions Runner Controller is functioning properly
  2. Runner pods are being created and scheduled correctly
  3. PVCs are being created and cleaned up as expected
  4. The workflow step ran to completion and reported its findings correctly

Recommendations

  1. For mechanic agent: Consider filtering out ephemeral Jobs created by the Actions Runner Controller, or specifically excluding workflow step Jobs from failure detection. These Jobs represent CI/CD workflow execution results, not infrastructure failures.

  2. Alternative: Configure mechanic to only alert on Jobs that are managed by Flux/Helm (have specific labels or annotations) rather than all Jobs in the cluster.

Confidence

Medium - While I am confident there is no infrastructure issue, I recommend human review to determine if this behavior is intentional or if the mechanic agent should be adjusted to handle ephemeral workflow Jobs differently.

Notes

  • The PVC was successfully created and used, indicating the storage configuration is correct
  • Other GitHub Actions runners in the same scale set are functioning normally
  • The TTLSecondsAfterFinished: 300 setting is standard for the Actions Runner Controller
  • This finding does not indicate any degradation of the runner infrastructure

For Human Reviewers

Please consider:

  1. Should the mechanic agent detect failed ephemeral Jobs created by CI/CD systems?
  2. If not, what criteria should be used to exclude these from monitoring?
  3. Is there any GitOps configuration change that would help reduce false positives?

Opened automatically by mechanic

…cument false positive - no infrastructure issue found
@k8s-mendabot k8s-mendabot Bot added the needs-human-review Requires human review before merging label Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human-review Requires human review before merging

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants