Fix ContainerExecProc joinWithTimeout deadlock#2789
Open
cronik wants to merge 1 commit intojenkinsci:masterfrom
Open
Fix ContainerExecProc joinWithTimeout deadlock#2789cronik wants to merge 1 commit intojenkinsci:masterfrom
cronik wants to merge 1 commit intojenkinsci:masterfrom
Conversation
Contributor
Author
|
Thread dump of deadlock |
Contributor
Author
|
Noticed #1538 attempted to solve this same deadlock issue I observed. This implementation attempts a different solution to the problem. |
This change updates the `ContainerExecProc#kill` method to force the finished countdown latch to decrement. It has been observed in some high load clusters where the `joinWithTimeout` timeout is reached but the proc continues to be blocked. When `joinWithTimeout` is called, the `kill` method is called if the task does not complete in time. https://github.com/jenkinsci/jenkins/blob/368f1ccbc967a85c0ff801f3729cb77a269afd41/core/src/main/java/hudson/Proc.java#L165 But if `kill` fails to trigger the `finished` countdown latch then the `join` method will continue to wait indefinitely. https://github.com/jenkinsci/kubernetes-plugin/blob/676ab933d12ad8b25e4d7f78594a32066aad2569/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecProc.java#L100 By forcing `finished.countDown()` after `close` the join should be unblocked even if the `ctl-c` command didn't trigger the exec listener. `countDown` is a no-op if the latch is already zero.
be5a61b to
7c90ef4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change updates the
ContainerExecProc#killmethod to force the finished countdown latch to decrement. It has been observed in some high load clusters where thejoinWithTimeouttimeout is reached but the proc continues to be blocked.When
joinWithTimeoutis called, thekillmethod is called if the task does not complete in time.https://github.com/jenkinsci/jenkins/blob/368f1ccbc967a85c0ff801f3729cb77a269afd41/core/src/main/java/hudson/Proc.java#L165
But if
killfails to trigger thefinishedcountdown latch then thejoinmethod will continue to wait indefinitely.kubernetes-plugin/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecProc.java
Line 100 in 676ab93
By forcing
finished.countDown()afterclosethe join should be unblocked even if thectl-ccommand didn't trigger the exec listener.countDownis a no-op if the latch is already zero.Fixes #2683
Testing done
Submitter checklist