Conversation
| ) | ||
| def instance_replica_restart( | ||
| request: Request, | ||
| ) -> dict[str, Any]: |
There was a problem hiding this comment.
non-blocking: i wonder if a typeddict or using the generated model would maybe be nicer here?
There was a problem hiding this comment.
Interesting, at the very least having a named model for the response would be great... i don't really see us importing the models into api/views/ files or instance/kubernetes.py at all, and instead only used on the client side, so I suppose I could do both--typeddict on the api side, reference the generated model on the client side?
There was a problem hiding this comment.
yea, i'm not sure if there's a reason for why we haven't tried using the generated named models or if we just never bothered since a lot of these endpoints predated our typing enthusiasm :p
| port = pick_random_port("paasta-dev-api") | ||
| # Generate api endpoints | ||
| api_endpoints = {"api_endpoints": {cluster: f"http://localhost:{port}"}} | ||
| # Just create both non-eks and eks endpoints, one has to be right :D |
Co-authored-by: Luis Pérez <luisp@yelp.com>
Co-authored-by: Luis Pérez <luisp@yelp.com>
Co-authored-by: Luis Pérez <luisp@yelp.com>
32e61cd to
16fab34
Compare
nemacysts
left a comment
There was a problem hiding this comment.
some minor comments, but nothing blocking
…nfig for non-forced replica restart
nemacysts
left a comment
There was a problem hiding this comment.
my bad, missed that there were updates!
We sometimes have folks who identify one/few pods that happen to show higher timings or errors, and often folks have wanted to try just restarting that bad pod rather than force all pods to get restarted.
This PR adds the option thru the PaaSTA API at
/services/{service}/{instance}/replicas/{replica}/restart, and reuses our PaaSTA API auth added for remote-run. (This means I also moved a helper that had originally only been for remote-run to utils so I could use it too).As this seemed pretty straightforward as an additional option on top of the existing
paasta restartsyntax, I just added this there, with the--replicaflag. I opted for--replicainstead of pod to keep the paasta terminology consistent w/paasta statusoutput (tho instances/instances still exists, tbf :p)Running this locally seemed to do the right thing:
The actual implementation is for the PaaSTA API to send a pod deletion request rather than actually 'restart' their main container, but I think this is what we generally reach for if someone asks us as oncall for a similar thing.
I do think more often than not people fighting 'bad' containers leads to a bad node rather than a bad replica, but this would at least give service owners a troubleshooting option that doesn't require elevated privileges or restarting all running pods.
For now, this makes a call to our PaaSTA API auth but /v1/services calls are not restricted; I'll attempt to add a rule with more specific path to try to restrict this particular call, but this also reuses the existing client-side LDAP check that
paasta restartalready uses.