Skip to content

Conversation

simongdavies
Copy link
Contributor

Fixes a race condition where a sandbox kill arrives after a sandbox has successfully exited causing the subsequent run to fail.

There is a breaking change in this PR, previously if kill was called on an InterruptHandle before or while a guest call was not in progress the next guest call made on the Sandbox would be cancelled , now this scenario is a no-op. kill only takes effect if there is a guest call running.

/// retrying until either:
/// - The signal is successfully delivered (VCPU transitions from running to not running)
/// - The VCPU stops running for another reason (e.g., call completes normally)
/// - No call is active (call_active=false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it block? I thought it would just return false?

/// - Set to true at the start of call_guest_function_by_name_no_reset()
/// - Cleared at the end of call_guest_function_by_name_no_reset()
/// - kill() only stamps cancel_requested if call_active is true
/// - If kill() is called when call_active=false, it returns false and has no effect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually the case? I don't see it returning early when call is not active.

target_generation = Some(generation);
}

// If not running, we've stamped the generation (if requested), so we're done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be a check that call_active is false so this exits early in the case that the call ins't active?

I think this is currently working due to the fact that when call_active running is also active. I think there is one case where the call might not be active but the vm is running (during initial set up) and i think it might hang in that scenario?

Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
// The virtualization stack can use this function to return the control
// of a virtual processor back to the virtualization stack in case it
// needs to change the state of a VM or to inject an event into the processor
debug!("Internal cancellation detected, returning Retry error");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bugfix For PRs that fix bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants