Add a 'restart --replica' option by jfongatyelp · Pull Request #4180 · Yelp/paasta

jfongatyelp · 2025-12-16T22:02:55Z

We sometimes have folks who identify one/few pods that happen to show higher timings or errors, and often folks have wanted to try just restarting that bad pod rather than force all pods to get restarted.

This PR adds the option thru the PaaSTA API at /services/{service}/{instance}/replicas/{replica}/restart, and reuses our PaaSTA API auth added for remote-run. (This means I also moved a helper that had originally only been for remote-run to utils so I could use it too).

As this seemed pretty straightforward as an additional option on top of the existing paasta restart syntax, I just added this there, with the --replica flag. I opted for --replica instead of pod to keep the paasta terminology consistent w/ paasta status output (tho instances/instances still exists, tbf :p)

Running this locally seemed to do the right thing:

$ PAASTA_SYSTEM_CONFIG_DIR=./etc_paasta_for_development/ python paasta_tools/cli/cli.py restart --replica paasta-contract-monitor-main-84564c8585-698x5 -s paasta-contract-monitor -i main  -c eksstage
[service paasta-contract-monitor] Issued request to restart replica 'paasta-contract-monitor-main-84564c8585-698x5' of paasta-contract-monitor.main by jfong@dev208-uswest1adevc.uswest1-devc.yelpcorp.com
✓ Successfully initiated restart of replica 'paasta-contract-monitor-main-84564c8585-698x5'
  Service: paasta-contract-monitor
  Instance: main
  Cluster: eksstage
  Replica: paasta-contract-monitor-main-84564c8585-698x5

Kubernetes will automatically create a replacement pod.

The actual implementation is for the PaaSTA API to send a pod deletion request rather than actually 'restart' their main container, but I think this is what we generally reach for if someone asks us as oncall for a similar thing.

I do think more often than not people fighting 'bad' containers leads to a bad node rather than a bad replica, but this would at least give service owners a troubleshooting option that doesn't require elevated privileges or restarting all running pods.

For now, this makes a call to our PaaSTA API auth but /v1/services calls are not restricted; I'll attempt to add a rule with more specific path to try to restrict this particular call, but this also reuses the existing client-side LDAP check that paasta restart already uses.

nemacysts · 2025-12-17T00:33:14Z

paasta_tools/api/views/instance.py

+)
+def instance_replica_restart(
+    request: Request,
+) -> dict[str, Any]:


non-blocking: i wonder if a typeddict or using the generated model would maybe be nicer here?

Interesting, at the very least having a named model for the response would be great... i don't really see us importing the models into api/views/ files or instance/kubernetes.py at all, and instead only used on the client side, so I suppose I could do both--typeddict on the api side, reference the generated model on the client side?

yea, i'm not sure if there's a reason for why we haven't tried using the generated named models or if we just never bothered since a lot of these endpoints predated our typing enthusiasm :p

paasta_tools/cli/cmds/start_stop_restart.py

paasta_tools/instance/kubernetes.py

paasta_tools/kubernetes_tools.py

nemacysts · 2025-12-17T21:59:40Z

paasta_tools/run-paasta-api-in-dev-mode.py

    port = pick_random_port("paasta-dev-api")
    # Generate api endpoints
-    api_endpoints = {"api_endpoints": {cluster: f"http://localhost:{port}"}}
+    # Just create both non-eks and eks endpoints, one has to be right :D


paasta_tools/instance/kubernetes.py

Co-authored-by: Luis Pérez <luisp@yelp.com>

nemacysts

some minor comments, but nothing blocking

paasta_tools/instance/kubernetes.py

paasta_tools/kubernetes_tools.py

…nfig for non-forced replica restart

nemacysts

my bad, missed that there were updates!

jfongatyelp requested a review from a team as a code owner December 16, 2025 22:02

nemacysts reviewed Dec 17, 2025

View reviewed changes

jfongatyelp commented Dec 23, 2025

View reviewed changes

paasta_tools/instance/kubernetes.py Outdated Show resolved Hide resolved

jfongatyelp and others added 12 commits January 5, 2026 14:02

Add a restart replica option

d0878f3

Handle api errors like remote_run

ca068d2

Add tests (thx claude)

f467d98

Apply suggestion from @nemacysts

44a3c1e

Co-authored-by: Luis Pérez <luisp@yelp.com>

Whoops, missed a codegenerated file?

06d02de

Update paasta_tools/cli/cmds/start_stop_restart.py

cad99f1

Co-authored-by: Luis Pérez <luisp@yelp.com>

Update paasta_tools/cli/cmds/start_stop_restart.py

1042f3d

Co-authored-by: Luis Pérez <luisp@yelp.com>

Apply suggestion from @nemacysts

b2407b3

Add force param, default to configured termination_grace_period_seconds

eebd01e

Apply suggestion from @nemacysts

8c53a3e

Apply suggestion from @jfongatyelp

79478c4

precommit

16fab34

jfongatyelp force-pushed the jfong/restart_replica branch from 32e61cd to 16fab34 Compare January 5, 2026 22:16

nemacysts previously approved these changes Jan 6, 2026

View reviewed changes

paasta_tools/instance/kubernetes.py Outdated Show resolved Hide resolved

paasta_tools/kubernetes_tools.py Outdated Show resolved Hide resolved

jfongatyelp added 3 commits January 8, 2026 10:31

Fix defaults and formatting on delete_pod_by_name

5cfbd67

Just use pod's configured grace period instead of pulling from job_co…

fa98d1e

…nfig for non-forced replica restart

Fix tests

77160ba

jfongatyelp dismissed nemacysts’s stale review via 77160ba January 8, 2026 19:15

nemacysts approved these changes Jan 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add a 'restart --replica' option#4180

Add a 'restart --replica' option#4180
jfongatyelp wants to merge 15 commits intomasterfrom
jfong/restart_replica

jfongatyelp commented Dec 16, 2025

Uh oh!

nemacysts Dec 17, 2025

Uh oh!

jfongatyelp Dec 20, 2025

Uh oh!

nemacysts Dec 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nemacysts Dec 17, 2025

Uh oh!

Uh oh!

nemacysts left a comment

Uh oh!

Uh oh!

Uh oh!

nemacysts left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

jfongatyelp commented Dec 16, 2025

Uh oh!

nemacysts Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

jfongatyelp Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

nemacysts Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nemacysts Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nemacysts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nemacysts left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants