Fix multicore_lockout features so that the victim core cannot become stuck in an infinite loop if the lockout attempt times out #2467

jwhitham · 2025-05-14T12:00:41Z

This bug can be triggered if the lockout request times out. If the victim CPU core has interrupts disabled for a long time, it will not respond quickly enough to the lockout request, which times out. However, the request is still pending in the FIFO, and when it is eventually handled by the victim CPU core, that core enters an infinite loop, waiting for a LOCKOUT_MAGIC_END message which never arrives.

I considered the solution of always sending a LOCKOUT_MAGIC_END message even on timeout, but this wouldn't work if the FIFO was already full. I could not use a blocking wait, as this would not respect the timeout, and as far as I can tell, there is no hardware support for clearing the FIFO without also resetting the CPU.

In this PR, the lockout state is controlled by a shared variable. The FIFO is used to begin a lockout and acknowledge it as before. But the end of the lockout is now signalled by updating the shared variable. This ensures that the end of the lockout request is recognised reliably by the victim CPU core, regardless of whether the end was caused by a timeout or whether the lockout completed normally. __wfe and __sev are used to signal updates to the shared variable in order to avoid polling.

The semantics of the multicore_lockout_end_... functions change: they will no longer wait for the lockout to end. This is described further in the updated doxygen and in the comments below.

The lockout state is controlled by a shared variable. The FIFO is used to begin a lockout and acknowledge it (i.e. multicore_lockout_handshake works as before) but the end of the lockout is now signalled by updating the shared variable. This ensures that timeouts are recognised reliably by the victim core. __wfe and __sev are used to signal updates to the shared variable in order to avoid polling.

src/rp2_common/pico_multicore/multicore.c

lurch · 2025-05-14T12:42:09Z

Does this PR mean that any of the Doxygen at https://github.com/raspberrypi/pico-sdk/blob/develop/src/rp2_common/pico_multicore/include/pico/multicore.h#L418 also needs updating to match, or is that all still "correct"?

jwhitham · 2025-05-14T17:54:24Z

Does this PR mean that any of the Doxygen at https://github.com/raspberrypi/pico-sdk/blob/develop/src/rp2_common/pico_multicore/include/pico/multicore.h#L418 also needs updating to match, or is that all still "correct"?

You're right, this did need updating. And, I didn't realise this until now, but the semantics of multicore_lockout_end_blocking and multicore_lockout_end_timeout_us are different now, because these functions do not wait for the lockout to end. They just guarantee that it will end.

I thought about whether it was a good idea to push ahead with this behaviour change. I think it is a good idea, because it's simpler, and I don't see the value in knowing that the end of the lockout has been acknowledged by the victim core. The new design avoids the possibility for multicore_lockout_end_timeout_us that "a timeout here will leave the "lockout" functionality in a bad state" (as previously stated in doxygen). Timeouts always leave the lockout functionality in a usable state, both for start and end. I think the old blocking behaviour for the multicore_lockout_end_... functions could be preserved but at the cost of a more complex design, and I feel like it wouldn't be justified... however, I may be missing something!

kilograham · 2025-05-16T16:27:55Z

src/rp2_common/pico_multicore/multicore.c

-
 static mutex_t lockout_mutex;
-static bool lockout_in_progress;
+static io_rw_32 lockout_request_id = 0;


Small style point, we would probably use volatile uint32_t explicitly here vs io_rw_32 which is really intended for memory mapped IO

Thanks, replaced.

kilograham · 2025-05-16T16:28:27Z

src/rp2_common/pico_multicore/multicore.c

    return rc;
 }

+static uint32_t update_lockout_request_id() {


can you add a (void) i'm pretty sure without this it makes at least one of the many GCC or Clang compiler versions unhappy

Thanks, fixed.

kilograham

Thanks, this looks great except a few nits

jwhitham changed the base branch from master to develop May 14, 2025 12:01

lurch reviewed May 14, 2025

View reviewed changes

src/rp2_common/pico_multicore/multicore.c Show resolved Hide resolved

lurch added the pico_multicore label May 14, 2025

Update documentation for multicore_lockout_end functions

eadf08b

kilograham self-assigned this May 14, 2025

kilograham self-requested a review May 14, 2025 17:40

Simplification, remove magic number (not required)

5551101

kilograham reviewed May 16, 2025

View reviewed changes

kilograham requested changes May 16, 2025

View reviewed changes

Review improvements

d928e45

jwhitham requested a review from kilograham May 16, 2025 18:39

Restore use of non-zero magic number

38de127

kilograham approved these changes May 20, 2025

View reviewed changes

kilograham added this to the 2.1.2 milestone May 20, 2025

kilograham merged commit 3515dad into raspberrypi:develop May 20, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multicore_lockout features so that the victim core cannot become stuck in an infinite loop if the lockout attempt times out #2467

Fix multicore_lockout features so that the victim core cannot become stuck in an infinite loop if the lockout attempt times out #2467

Uh oh!

jwhitham commented May 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

lurch commented May 14, 2025

Uh oh!

jwhitham commented May 14, 2025

Uh oh!

kilograham May 16, 2025

Uh oh!

jwhitham May 16, 2025

Uh oh!

kilograham May 16, 2025

Uh oh!

jwhitham May 16, 2025

Uh oh!

kilograham left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix multicore_lockout features so that the victim core cannot become stuck in an infinite loop if the lockout attempt times out #2467

Fix multicore_lockout features so that the victim core cannot become stuck in an infinite loop if the lockout attempt times out #2467

Uh oh!

Conversation

jwhitham commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lurch commented May 14, 2025

Uh oh!

jwhitham commented May 14, 2025

Uh oh!

kilograham May 16, 2025

Choose a reason for hiding this comment

Uh oh!

jwhitham May 16, 2025

Choose a reason for hiding this comment

Uh oh!

kilograham May 16, 2025

Choose a reason for hiding this comment

Uh oh!

jwhitham May 16, 2025

Choose a reason for hiding this comment

Uh oh!

kilograham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jwhitham commented May 14, 2025 •

edited

Loading