Skip to content

Conversation

robamu
Copy link
Contributor

@robamu robamu commented Mar 25, 2025

One thing I am not sure about is how to best determine the CPU ID. The MPIDR register is implementation defined, and the individual affinity fields might have completely different meanings depending on the actual implementation (https://developer.arm.com/documentation/ddi0406/b/System-Level-Architecture/Virtual-Memory-System-Architecture--VMSA-/CP15-registers-for-a-VMSA-implementation/c0--Multiprocessor-Affinity-Register--MPIDR-?lang=en#CIHIEGCJ) . The previous implementation in the Cortex-A run-time, which reads the least significant 2 bits of affinity 0, was just the Zynq7000 specific implementation of the MPIDR register. I replaced this with reading all the bits in affinity 0 as well.

Should we still keep this implementation where we assume that a value of 0 in affinity 0 field means CPU 0?

@robamu robamu force-pushed the harmonize-entry-points branch 2 times, most recently from 248831f to 0293e3e Compare March 25, 2025 11:00
_default_start:
// only allow cpu0 through for initialization
// Read MPIDR
mrc p15,0,r1,c0,c0,5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write it down, and there's no tool to help, but I've been trying for format the assembly like:

some_label:
    bl      thing
    cmp     r1, #0 <- comma and space between arguments
    ^^^^^^^^ <- eight characters, with spaces for padding
^^^^ <- four spaces
```

wait_loop:
wfe
// When Core 0 emits a SEV, the other cores will wake up.
// Load CPU ID, we are CPU0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not CPU 0, because if you were CPU 0 you'd skip this loop and go straight to initialize?

wait_loop:
wfe
// When Core 0 emits a SEV, the other cores will wake up.
// Load CPU ID, we are CPU0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As previous comment

@jonathanpallant
Copy link
Contributor

It may also be better to use numeric labels like 1 and branch targets like 1f (go forward to the label 1) or 1b (go backward to the label 1), because then you always get a unique symbol name generated. I worry that with a label like initialize that might be visible to the linker and clash with any other C / no_mangle function of the same name.

mrc p15,0,r1,c0,c0,5
// Extract CPU ID bits by reading affinity level 0.
// For single-core systems, this should always be 0.
and r1, r1, #0xFF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check the bottom 16 bits are zero (affinity level 0 and affinity level 1)?

// Load CPU ID, we are CPU0
mrc p15,0,r0,c0,c0,5
// Extract CPU ID bits.
and r0, r0, #0x3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.arm.com/documentation/100026/0104/System-Control/AArch32-register-descriptions/Multiprocessor-Affinity-Register?lang=en suggests the bottom 24 bits are important. But actually we might be running on a cluster than isn't cluster 0, so I would select maybe the bottom 16 bits?

@robamu robamu force-pushed the harmonize-entry-points branch 2 times, most recently from 040977f to 6235f2e Compare March 26, 2025 09:26
@robamu robamu force-pushed the harmonize-entry-points branch from 6235f2e to ad06b54 Compare March 26, 2025 09:28
@robamu
Copy link
Contributor Author

robamu commented Mar 26, 2025

I could check out the datasheets of some other Cortex-A devices about MPIDR usage, e.g. STM32MP, or Ultrascale+

@jonathanpallant
Copy link
Contributor

jonathanpallant commented Mar 26, 2025

I could check out the datasheets of some other Cortex-A devices about MPIDR usage, e.g. STM32MP, or Ultrascale+

That would be great. I think it's OK to have a default for 80% of the people, if there's a clear mechanism to override the default for specific use-cases. Like, 'Hey, if MPIDR is 0x4000_0100 then that's my default boot core - spin all the others'. I don't know how to plug user-specified constants into the runtime though. Environment variables? Eww.

@robamu
Copy link
Contributor Author

robamu commented Mar 26, 2025

There is

I have not found a system with clusters yet, but wouldn´t each cluster most probably run a different software anyway?

@jonathanpallant
Copy link
Contributor

I was thinking the bottom byte might be for hyper threading (then cores, then clusters), but now I think about it, Arm doesn't do that. So the bottom byte is probably fine then.

@jonathanpallant
Copy link
Contributor

I'm going to try and make a multi-core example to verify this all works as expected.

AN536 defaults to only creating a single CPU; this is the equivalent of the way the real FPGA image usually runs with the second Cortex-R52 held in halt via the initial SCC CFG_REG0 register setting. You can create the second CPU with -smp 2; both CPUs will then start execution immediately on startup.

(https://www.qemu.org/docs/master/system/arm/mps2.html)

@jonathanpallant
Copy link
Contributor

jonathanpallant commented Mar 26, 2025

criticalup run cargo run --target=armv8r-none-eabihf -- -smp 2

This doesn't work because the second CPU never waits on the wfe. I've observed this before - I don't know if it's a qemu-system-arm bug, or if there's always an event pending, or what.

Perhaps spinning on a shared variable is safer than a WFE?

wait:
        wfe
        ldr     r0, =shared_variable
        beq     wait

Then if they can WFE, they'll save power, and if not, it'll still work.

Does QEMU emulate WFE on Cortex-A SMP systems?

@robamu
Copy link
Contributor Author

robamu commented Mar 27, 2025

Hmm, I don't know. I have not found anything in the QEMU docs, so this is probably somewhere in the source code..
The core never waiting on wfe seems weird. Spinning on a shared variable as a safety feature? If this just happens in QEMU / because of a QEMU bug, maybe a an loop in the test app boot core method also works?

There are definitely alternatives to WFE. I think oftentimes, this logic will be overriden for specific CPU families. For example, special handling is required for the zynq7000. Maybe spinning would be a simpler default solution? Requires one shared / global variable though.

@robamu
Copy link
Contributor Author

robamu commented Mar 27, 2025

I looked for some more resources:

Code:

Docs:

Summary:

  • Zynq7000 boot seems to directly use the MPIDR, but I override the default boot routine anyway for other reasons. device specific bootup for secondary cores
  • STM32MP ROM code parks all processors except the primary core inside an infinite loop and uses a device specific bootup routine for the secondary core
  • Ultrascale+ has a dedicated platform management unit for handling the bootup. Startup code can only run on dedicated cores.

Maybe it would be a better idea to just ignore the MPIDR/SMP aspect in the provided default boot routine, and assume that either the startup routine is overrident if this is an issue, or the HW takes care of just executing the startup code with one core.. But then, boot_core does not really make much sense anymore.

@jonathanpallant
Copy link
Contributor

perhaps you're right - just let users add a bit of code to handle multi-core start up. They can always call our default start-up routine once they've worked out which core they're on.

So we can close this PR, but what should we do with multi-core support in cortex-a-rt?

@robamu
Copy link
Contributor Author

robamu commented Mar 27, 2025

The support could be removed and it could be changed to also use kmain again, similar to the Cortex-R runtime

@robamu
Copy link
Contributor Author

robamu commented Apr 2, 2025

Replaced by #22

@robamu robamu closed this Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants