Skip to content

Requeue if create pod returns already exists error #4201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nikola-jokic
Copy link
Collaborator

There are situations when the newly created pod is not yet delivered through the watcher, causing subsequent reconciliations to issue a second create pod event.

This change tests for the condition, logs at the info level, and re-queues the event in 5s.
It is not necessary to do it for secrets since the secrets are not watched; therefore, the GET request was being issued directly.

It will reduce the noise in the log while backing off the reconciliation for a bit.

@Copilot Copilot AI review requested due to automatic review settings August 8, 2025 10:48
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a race condition where the ephemeral runner controller attempts to create a pod that already exists but hasn't been received through the Kubernetes event watcher yet. The change adds error handling to detect "already exists" errors and requeues the reconciliation with a 5-second delay instead of logging an error.

  • Adds detection for "already exists" errors when creating pods
  • Implements a requeue mechanism with 5-second delay for the race condition
  • Updates log message to be more descriptive about the pod creation process

@@ -268,13 +268,15 @@ func (r *EphemeralRunnerReconciler) Reconcile(ctx context.Context, req ctrl.Requ
log.Error(err, "Failed to fetch the pod")
return ctrl.Result{}, err
}
log.Info("Ephemeral runner pod does not exist. Creating new ephemeral runner")
Copy link
Preview

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message is misleading. At this point, the code has determined the pod doesn't exist, but the new error handling suggests the pod might actually exist. Consider a more accurate message like 'Attempting to create new ephemeral runner pod'.

Suggested change
log.Info("Ephemeral runner pod does not exist. Creating new ephemeral runner")
log.Info("Attempting to create new ephemeral runner pod")

Copilot uses AI. Check for mistakes.

@nikola-jokic nikola-jokic added the gha-runner-scale-set Related to the gha-runner-scale-set mode label Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gha-runner-scale-set Related to the gha-runner-scale-set mode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant