Skip to content

Conversation

@jwhitham
Copy link
Contributor

@jwhitham jwhitham commented May 14, 2025

Fixes #2454

This bug can be triggered if the lockout request times out. If the victim CPU core has interrupts disabled for a long time, it will not respond quickly enough to the lockout request, which times out. However, the request is still pending in the FIFO, and when it is eventually handled by the victim CPU core, that core enters an infinite loop, waiting for a LOCKOUT_MAGIC_END message which never arrives.

I considered the solution of always sending a LOCKOUT_MAGIC_END message even on timeout, but this wouldn't work if the FIFO was already full. I could not use a blocking wait, as this would not respect the timeout, and as far as I can tell, there is no hardware support for clearing the FIFO without also resetting the CPU.

In this PR, the lockout state is controlled by a shared variable. The FIFO is used to begin a lockout and acknowledge it as before. But the end of the lockout is now signalled by updating the shared variable. This ensures that the end of the lockout request is recognised reliably by the victim CPU core, regardless of whether the end was caused by a timeout or whether the lockout completed normally. __wfe and __sev are used to signal updates to the shared variable in order to avoid polling.

The semantics of the multicore_lockout_end_... functions change: they will no longer wait for the lockout to end. This is described further in the updated doxygen and in the comments below.

The lockout state is controlled by a shared variable. The FIFO
is used to begin a lockout and acknowledge it (i.e.
multicore_lockout_handshake works as before) but the end
of the lockout is now signalled by updating the shared variable.
This ensures that timeouts are recognised reliably by the victim core.
__wfe and __sev are used to signal updates to the shared variable
in order to avoid polling.
@jwhitham jwhitham changed the base branch from master to develop May 14, 2025 12:01
@lurch
Copy link
Contributor

lurch commented May 14, 2025

Does this PR mean that any of the Doxygen at https://github.com/raspberrypi/pico-sdk/blob/develop/src/rp2_common/pico_multicore/include/pico/multicore.h#L418 also needs updating to match, or is that all still "correct"?

@kilograham kilograham self-assigned this May 14, 2025
@kilograham kilograham self-requested a review May 14, 2025 17:40
@jwhitham
Copy link
Contributor Author

Does this PR mean that any of the Doxygen at https://github.com/raspberrypi/pico-sdk/blob/develop/src/rp2_common/pico_multicore/include/pico/multicore.h#L418 also needs updating to match, or is that all still "correct"?

You're right, this did need updating. And, I didn't realise this until now, but the semantics of multicore_lockout_end_blocking and multicore_lockout_end_timeout_us are different now, because these functions do not wait for the lockout to end. They just guarantee that it will end.

I thought about whether it was a good idea to push ahead with this behaviour change. I think it is a good idea, because it's simpler, and I don't see the value in knowing that the end of the lockout has been acknowledged by the victim core. The new design avoids the possibility for multicore_lockout_end_timeout_us that "a timeout here will leave the "lockout" functionality in a bad state" (as previously stated in doxygen). Timeouts always leave the lockout functionality in a usable state, both for start and end. I think the old blocking behaviour for the multicore_lockout_end_... functions could be preserved but at the cost of a more complex design, and I feel like it wouldn't be justified... however, I may be missing something!


static mutex_t lockout_mutex;
static bool lockout_in_progress;
static io_rw_32 lockout_request_id = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small style point, we would probably use volatile uint32_t explicitly here vs io_rw_32 which is really intended for memory mapped IO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, replaced.

return rc;
}

static uint32_t update_lockout_request_id() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a (void) i'm pretty sure without this it makes at least one of the many GCC or Clang compiler versions unhappy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

Copy link
Contributor

@kilograham kilograham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great except a few nits

@jwhitham jwhitham requested a review from kilograham May 16, 2025 18:39
@kilograham kilograham added this to the 2.1.2 milestone May 20, 2025
@kilograham kilograham merged commit 3515dad into raspberrypi:develop May 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When using flash_safe_execute, if the victim core does not respond within the timeout, it will be locked out forever

3 participants