fix: don't error if pod has no containerStatuses (#4471)#4473
fix: don't error if pod has no containerStatuses (#4471)#4473ianroberts wants to merge 3 commits intocanonical:masterfrom
Conversation
neoaggelos
left a comment
There was a problem hiding this comment.
LGTM, thanks a lot for the catch @ianroberts !
|
Would you also mind signing the CLA? And I believe we should also backport this change to older branches as well |
I did just sign the CLA, though the email address I've used is a secondary email on my launchpad account ( |
|
The CLA check fails with: Can you amend the commit and use the email that you signed the CLA with? |
That's what I mean - the email I signed the CLA with is I've just edited my launchpad profile to swap the primary and secondary addresses, can you run the CLA checker again now? |
|
Sorry, my bad, I had added the address to my Ubuntu one account but not directly to my Launchpad account, it should be there now. But I'm not showing as a member of https://launchpad.net/~contributor-agreement-canonical, maybe I need to fill in the CLA again now the right emails are all linked? |
|
Yes, can you please sign the CLA with the email address used in the commit? Thank you! |
|
I already had, twice... just filled in the form again, let's see if it takes this time. Edit: I see it's at least found the matching LP account for my email now. |
|
This bug is still present - I've just tried again to sign the CLA, can you see whether it has succeeded this time? |
|
I'd like to chip in with some collateral damage this issue has produced in our setup. We've had a major outage in production and almost another one (we caught this early and knew how to intervene), due to:
|
In certain circumstances pod["status"] may not have a containerStatuses child property at all (e.g. if the pod is Pending and not yet scheduled to any node). Guard against this by using .get (returns None) instead of [] (raises KeyError).
4366e96 to
4b55c8a
Compare
|
Rebased on latest master |
|
This has been kind of superseded (at least in 1.32) by #4819 - this bug is still present in |
|
I say only "kind of" superseded, because the default hook scripts in 1.32 have been changed to only look for pods on the current node, but these are only the default scripts that get installed if there are no hooks already present in |
HomayoonAlimohammadi
left a comment
There was a problem hiding this comment.
Thanks a lot for the PR @ianroberts! LGTM overall, just a minor nit and we can merge it.
|
@ianroberts That's right, sorry I committed the change since your response was not loaded for some reason. |
|
I believe the failing addon tests are not related to this PR. Should we merge this one or wait for a fix to rebase this PR on top of it @louiseschmidtgen? |
|
@HomayoonAlimohammadi The issue is resolved can you re-run the workflow please? |
Summary
kill-host-pods.pyiterates over all pods in all namespaces looking for pods that are running on the current node. But if any pods in the iteration do not havecontainerStatusesin their pod status structure (e.g. if they're Pending waiting for a valid node to schedule to) then this currently causes the script to crash with aKeyError. This PR changes the code to treat a missingcontainerStatusesthe same as an empty one, so the pod is correctly skipped rather than the whole script crashing.Closes #4471
Changes
pod["status"]["containerStatuses"](which raises an error ifcontainerStatusesis missing) topod["status"].get("containerStatuses")(which simply returnsNone).Testing
Tested locally with a copy of the script, I believe the change is trivial enough not to require automated tests beyond this.
Checklist
Notes