Fix CancellableTasksIT.testChildrenTasksCancelledOnTimeout() #129513
                
     Closed
            
            
          
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
A race condition exists in
TransportTestActionthat could result in a subrequest timing out before the corresponding task has been registered on the remote node. This change add a trace log message to theTaskManagerfor when a children list is not found incancelChildLocal()and a counter that is incremented. The test still verifies that at least one child task is cancelled via the expected cancellation mechanism, manually cancelling any orphaned tasks and verifies that the number of manual cancellations required equals the number ofcancelChildLocal()calls observed where a children list was not found.To reproduce the failure reliably I changed the timeout in
TransportTestActionfrom 400 milliseconds to 1 milliseconds when testing locally.See #123568 for details about possible follow ups on either implementing cancellation retry support, refactoring
TransportTestActionto enforce ordering of requests so timeouts do not occur until the corresponding remote task has been registered, and discussion of support for transport-level timeouts in general.Closes #123568