(workaround) codejail-service and all k8s pods are not seen by GoCD after coming up

codejail-service deploys get stuck because a pod fails to come up within the readiness probe retries window. Pods that are deleted also fail to come up.

Sometimes pods *do* come up; we're not sure under which circumstances. It seems like k8s eventually replaces the broken pods

A/C:

- [ ] Give GoCD longer timeouts to allow an overloaded ArgoCD time to say "yes, the sync has finished".  Monitor the timeouts to ensure the new value chosen is appropriately handling the failures AND minimizing a lengthy timeout.
- [ ] Possibly reduce priority level on deployment alerts (since they're less reliable)
- [ ] File a ticket for 1) reducing ArgoCD staleness and 2) then tightening up our deploy standards again (undoing the above -- search for code references to this ticket)

Notes:

- This problem wasn't occurring in the first month that codejail-service was deployed. But as of June 23 if we kill a pod, the replacement has successful startup checks within 6 seconds yet apparently fails to respond to readiness and liveness checks after 1-2 minutes.
- We tried increasing retry counts but it didn't help: https://github.com/edx/edx-internal/pull/12996
- We've also been experiencing some Datadog metrics and APM cutouts that have impeded diagnosing this. Unclear if related. (Infrastructure issue?)
- EKS upgrades seem to have fixed some issues with pod readiness/liveness probes.
- Our latest information (as of early July 2025) is that ArgoCD seems to just be overloaded and is providing stale information at times (to both GoCD and in the UI).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(workaround) codejail-service and all k8s pods are not seen by GoCD after coming up #1073

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(workaround) codejail-service and all k8s pods are not seen by GoCD after coming up #1073

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions