Add goroutine core affinity support for RP2040/RP2350 systems #5092

amken3d · 2025-11-16T23:00:25Z

This PR proposes

Support for CPU core pinning and affinity for tasks and goroutines.
Updated the scheduler to respect affinity constraints with separate queues for pinned and shared tasks.
Added new runtime API functions LockToCore, UnlockFromCore, GetAffinity, and CurrentCPU.
Example program demonstrates core pinning and unpinned execution behavior.

API Functions

`runtime.NumCPU() int`

Returns the number of CPU cores available (returns 2 on RP2040/RP2350).

`runtime.CurrentCPU() int`

Returns the current CPU core number (0 or 1).

`runtime.LockToCore(core int)`

Pins the current goroutine to the specified core:

core = 0 - Pin to core 0
core = 1 - Pin to core 1
core = -1 - Unpin (allow running on any core)

Panics if core is invalid (not -1, 0, or 1).

`runtime.UnlockFromCore()`

Unpins the current goroutine, allowing it to run on any core.
Equivalent to runtime.LockToCore(-1).

`runtime.GetAffinity() int`

Returns the current goroutine's CPU affinity:

Returns -1 if not pinned (can run on any core)
Returns 0 or 1 if pinned to that specific core

Example program included in the examples directory

Tested on both pico and pico2
Output of example program

=== Core Pinning Example ===                                             
Number of CPU cores: 2                                                   
Main starting on core: 0                                                 
                                                                         
Main pinned to core: 0

Core 0 (main): 0 on CPU 0
Worker pinned to core: 1
  Core 1 (worker): 0 on CPU 0
Unpinned worker starting, affinity: 0
    Unpinned worker: 0 on CPU 0
Core 0 (main): 1 on CPU 0
    Unpinned worker: 1 on CPU 0
Core 0 (main): 2 on CPU 0
  Core 1 (worker): 2 on CPU 1
    Unpinned worker: 2 on CPU 0                                                 
Core 0 (main): 3 on CPU 0                                                       
  Core 1 (worker): 3 onCPU 1                                                    
Core 0 (main): 4 on CPU 0                                                       
  Core 1 (worker): 4 on CPU 1                                                   
    Unpinned worker: 3 on CPU 0                                                 
Core 0 (main): 5 on CPU 0                                                       
  Core 1 (worker): 5 on CPU 1                                                   
    Unpinned worker: 4 on CPU 0                                                 
Core 0 (main): 6 on CPU 0                                                       
  Core 1 (worker): 6 on CPU 1                                                   
Core 0 (main): 7 on CPU 0                                                       
  Core 1 (worker): 7 on CPU 1                                                   
    Unpinned worker: 5 on CPU 0                                                 
Core 0 (main): 8 on CPU 0                                                       
  Core 1 (worker): 8 on CPU 1                                                   
    Unpinned worker: 6 on CPU 0                                                 
Core 0 (main): 9 on CPU 0                                                       
  Core 1 (worker): 9 on CPU 1                                                   
    Unpinned worker: 7 on CPU 0                                                 
                                                                                
Main unpinned, affinity: -1                                                     
Unpinned main on CPU 0                                                          
  Core 1 worker finished                                                        
Unpinned main on CPU 0                                                          
    Unpinned worker: 8 on CPU 0                                                 
Unpinned main on CPU 0                                                          
    Unpinned worker: 9 on CPU 0                                                 
Unpinned main on CPU 0                                                          
Unpinned main on CPU 1                                                          
    Unpinned worker finished                                                    
                                                                                
Example complte!

- Introduced support for CPU core pinning and affinity for tasks and goroutines. - Updated the scheduler to respect affinity constraints with separate queues for pinned and shared tasks. - Added new runtime API functions `LockToCore`, `UnlockFromCore`, `GetAffinity`, and `CurrentCPU`. - Example program demonstrates core pinning and unpinned execution behavior.

eliasnaur · 2025-11-17T10:54:18Z

Do you actually care about the particular core? If not, are the existing runtime.LockOSThread and runtime.UnlockOSThread calls sufficient to lock/unlock a goroutine to a core?

amken3d · 2025-11-17T12:53:53Z

This is what I see for LockOsThread

// LockOSThread wires the calling goroutine to its current operating system thread.
// Stub for now
// Called by go1.18 standard library on windows, see golang/go#49320
func LockOSThread() {
}

// UnlockOSThread undoes an earlier call to LockOSThread.
// Stub for now
func UnlockOSThread() {
}

There seems to be no implementation behind it.

For the RP2, since it is symmetrical multi processor, it probably doesn't matter which exact core. But for something like StM32h7, it would matter which core you pin to. (I know we don't support multicore on it yet)

eliasnaur · 2025-11-17T12:58:06Z

I know. What I'm saying is to change LockOSThread to mean "lock the current goroutine to a core" (when using the cores scheduler).

amken3d · 2025-11-17T13:04:08Z

Fair point. That seems reasonable to me. I can use those function names instead.
The only issue I see is that Lock OsThread can't take any arguments. I'd like to be able to tell which core to lock to.

eliasnaur · 2025-11-17T16:37:38Z

Right. So LockOSThread is enough for use cases where you only care about exclusive access to a some core. For heterogeneous cores, I suggest:

Move the API to package machine.
Drop CurrentCPU - it's racy (its result may be invalidated at any time).
Drop GetAffinity - it seems that code that pin itself to a particular core should know what it's doing.
Drop the -1 special case from LockToCore
Rename LockToCore to LockCore to mimic LockOSThread naming. Rename UnlockFromCore to UnlockCore for the same reason.

An important issue to think about is what happens if the requested core is busy? LockOSThread doesn't have this problem (some core must be running it).

- Dropped CurrentCPU - Dropped GetAffinity - Renamed LockToCore to LockCore to mimic LockOSThread naming. - Updated examples program

amken3d · 2025-11-18T15:05:45Z

Looks like it passed all checks except the macos(13) test with this error

This is a scheduled macos-13 brownout. The macOS-13 based runner images are being deprecated. For more details, see actions/runner-images#13046.

deadprogram · 2025-11-18T15:25:43Z

@amken3d I just created #5093 to address the macOS 13 runner deprecation.

eliasnaur

Thanks. I've commented on the implementation, but I'm still not a fan of the LockCore API, because it may block indefinitely if a long-running goroutine is running on the target core. In a sense, LockCore acts as a per-core mutex that some arbitrary other goroutine may have taken, with the usual deadlock risks.

One way of getting around this issue is by requiring LockCore to be called before any other goroutine has started. A good place would be in an init function.

eliasnaur · 2025-11-18T17:58:30Z

src/runtime/runtime.go

-// Stub for now
+// On microcontrollers with multiple cores (e.g., RP2040/RP2350), this pins the
+// goroutine to the core it's currently running on.
 // Called by go1.18 standard library on windows, see https://github.com/golang/go/issues/49320


While here, remove this now irrelevant comment.

eliasnaur · 2025-11-18T17:58:58Z

src/runtime/runtime.go

+// On microcontrollers with multiple cores (e.g., RP2040/RP2350), this pins the
+// goroutine to the core it's currently running on.


Is it more precise to say "with the "cores" scheduler"?

eliasnaur · 2025-11-18T18:02:50Z

src/machine/machine_rp2_cores.go

+
+const numCPU = 2 // RP2040 and RP2350 both have 2 cores
+
+// LockCore implementation for the cores scheduler.


This needs a more detailed description. For example, it doesn't say what happens if the target core is busy. I believe LockCore returns. If so, this is surprising to me; I would expect that once LockCore returns, the calling goroutine is running on the target core.

…behavior, and limitations with the "cores" scheduler. Updated LockOSThread and UnlockOSThread comments to reflect core pinning behavior on RP2040/RP2350.

eliasnaur

@aykevl should probably take a look.

src/machine/machine_rp2_cores.go

eliasnaur · 2025-11-19T14:13:10Z

src/machine/machine_rp2_cores.go

+// After calling UnlockCore, the scheduler is free to schedule the goroutine on
+// any core for automatic load balancing.
+//
+// Only available on RP2040 and RP2350 with the "cores" scheduler.


Superfluous comment.

eliasnaur · 2025-11-19T14:14:21Z

src/machine/machine_rp2_cores.go

+// To avoid potential blocking on a busy core, consider calling LockCore in an
+// init function before any other goroutines have started. This guarantees the
+// target core is available.


I think this should be a hard requirements; that is, LockCore should panic if any other goroutine has started.

Regarding "panic if goroutines started" - I have a specific use case for dynamic pinning:

Motion control board where:

Core 0: Communications and non-critical tasks
Core 1: Hard real-time step generation (must not be interrupted)
The pattern I need is:

func main() {
go func() {
machine.LockCore(1) // Pin worker to core 1
stepGenerationLoop()
}()
machine.LockCore(0) // Pin main to core 0
commsLoop()
}
If LockCore panics when goroutines have started, this pattern wouldn't work.

Would you accept one of these:

Adding Gosched() in LockCore (so it actually migrates before returning)

Keeping dynamic pinning allowed, with clear documentation of risks

Perhaps a convention that each core should only have ONE pinned goroutine?

The deadlock risk is manageable if users follow the pattern of pinning early and using one goroutine per core for pinned work.

Adding Gosched() in LockCore (so it actually migrates before returning)

Yes, if we do this LockCore should not return before the core is pinned.

However, given your use case: is it possible to run the step generation off a hardware timer and an interrupt handler? That seems a better fit, and you can keep both cores running non-critical code.

No, That is precisely the implementation that I am trying to get away from.

Can you elaborate why? Assuming your stepGenerationLoop sometimes sleeps, it seems much less efficient to lock a core 100% of the time, even during sleeps than running an interrupt off a timer.

The interrupt-based approach has several issues for precision step generation:

Interrupt Latency & Jitter Interrupts can be delayed by:
Other higher-priority interrupts
Critical sections in the runtime (GC, scheduler locks)
interrupt.Disable() calls in drivers
For step generation, even microseconds of jitter causes visible artifacts (rough surfaces, inaccurate positioning). A dedicated core can be made to have deterministic timin

My step generation loop doesn't time.Sleep() - it does tight timing loops or busy-waits on hardware timers for sub-microsecond precision:

func stepGenerationLoop() {
for {
waitForNextStepTime() // Busy-wait or timer poll
pulseStepPins() // Must happen at EXACT time
calculateNextStep() // Acceleration/deceleration curves
}
}

The core isn't idle - it's maintaining precise timing. An interrupt would add latency between "timer fired" and "step pin toggled."

Interrupt Context Limitations In an interrupt handler, I can't:
Use blocking operations
Allocate memory (GC issues)
Have complex state machines easily
Use goroutine synchronization primitives
With a dedicated core, I can write normal Go code with loops, state, and even coordinate multiple axes cleanly.

The Alternative is Worse Without core pinning, my options are:
Interrupt-based (jitter issues above)
PIO-only (limited for complex multi-axis coordination)
External step generator chip (added cost/complexity)
Core pinning gives me a way to isolate real-time work.

I see. I believe I'm doing something similar on a PIO-capable chip (rp2350). My solution is a 3-layer architecture:

Regular Go code generates high-level primitives (line segments, bezier curves etc.) and sends them over a channel. This code is only timing sensitive to the extent that it needs to run often enough to not starve the interrupt handler. Depending on your application, you can get rid of the timing requirement by generating batches of primitives where each batch ends with the machine at a safe point (zero velocity/acceleration etc.).

Interrupt handler receives from the channel and chops the primitives into fixed-duration updates and packs them into DMA buffers. For example, 2 steppers is 4 bits (2xdir, 2xstep) per update, and 8 updates per 32-bit PIO FIFO word.
The interrupt handler is only timing sensitive insofar it must run often enough to fill the DMA buffers. If you don't have too much going on, you may not even need DMA and can get away with feeding the PIO FIFO buffer (8 words I believe) directly.

A PIO state machine receives updates from its FIFO (fed directly or through DMA) and multiplex them onto the corresponding GPIO pins. The nature of PIOs makes the timing very robust and decoupled from the activity of the system, except for contention on the memory bus.

There is always multiple ways to skin a cat. I am using an approach similar to what you described as well. But having core pinning has its advantages as well. Also, It seems like several people have asked for it previously. Appreciate your reviews and responses

src/machine/machine_rp2_cores.go

…it in machine_rp2_cores.go

Refactored based on Elias' comments

870af4f

- Dropped CurrentCPU - Dropped GetAffinity - Renamed LockToCore to LockCore to mimic LockOSThread naming. - Updated examples program

eliasnaur reviewed Nov 18, 2025

View reviewed changes

Improved LockCore and UnlockCore documentation for clarity on usage, …

c00fc84

…behavior, and limitations with the "cores" scheduler. Updated LockOSThread and UnlockOSThread comments to reflect core pinning behavior on RP2040/RP2350.

eliasnaur reviewed Nov 19, 2025

View reviewed changes

amken3d added 4 commits November 22, 2025 15:56

Add runtime.Gosched call in LockCore for improved scheduling

9025053

Removed superfluous comments

8820b4e

Added GoSched() to scheduler_cores.go instead of incorrectly calling …

3ca2b71

…it in machine_rp2_cores.go

Remove unused import of "runtime" in machine_rp2_cores.go

217adfb

		// On microcontrollers with multiple cores (e.g., RP2040/RP2350), this pins the
		// goroutine to the core it's currently running on.


		const numCPU = 2 // RP2040 and RP2350 both have 2 cores

		// LockCore implementation for the cores scheduler.

Add goroutine core affinity support for RP2040/RP2350 systems #5092

Are you sure you want to change the base?

Add goroutine core affinity support for RP2040/RP2350 systems #5092

Uh oh!

Conversation

amken3d commented Nov 16, 2025

API Functions

runtime.NumCPU() int

runtime.CurrentCPU() int

runtime.LockToCore(core int)

runtime.UnlockFromCore()

runtime.GetAffinity() int

Example program included in the examples directory

Uh oh!

eliasnaur commented Nov 17, 2025

Uh oh!

amken3d commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eliasnaur commented Nov 17, 2025

Uh oh!

amken3d commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eliasnaur commented Nov 17, 2025

Uh oh!

amken3d commented Nov 18, 2025

Uh oh!

deadprogram commented Nov 18, 2025

Uh oh!

eliasnaur left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eliasnaur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eliasnaur Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`runtime.NumCPU() int`

`runtime.CurrentCPU() int`

`runtime.LockToCore(core int)`

`runtime.UnlockFromCore()`

`runtime.GetAffinity() int`

amken3d commented Nov 17, 2025 •

edited

Loading

amken3d commented Nov 17, 2025 •

edited

Loading

eliasnaur Nov 23, 2025 •

edited

Loading