Support booting the kernel in EL1 with a PSCI proxy by m4tx · Pull Request #8 · google/ritm

m4tx · 2025-11-26T15:20:25Z

No description provided.

src/hypervisor/psci.rs

src/arch.rs

src/exceptions.rs

src/hypervisor.rs

qwandor · 2025-12-01T20:56:36Z

src/hypervisor.rs

+/// A pointer to a stack allocated for a secondary CPU.
+///
+/// This must not be dropped as long as the secondary CPU is running.
+struct SecondaryStack {


What's the advantage of using this wrapper type over just using Box<Stack<...>> directly?

Not much, I thought it will be a bit more readable with having a separate named type, but it's also fine to remove it if you think otherwise.

src/main.rs

qwandor · 2025-12-01T20:58:44Z

src/main.rs


 const HEAP_SIZE: usize = 40 * PAGE_SIZE;
+#[unsafe(no_mangle)]
+#[unsafe(link_section = ".heap")]


Why does this need to be in a separate section, and does it still get properly initialised?

When I thought I need to prepare stack for the EL1 guest, it made sense that the heap space should be accessible to it, and hence needed to have a well-known address (because otherwise it will be reserved in the Device Tree/stage-2 pagetables). Now it's not needed.

src/hypervisor.rs

qwandor · 2026-02-05T01:18:39Z

src/platform/qemu.rs

+                .build(),
+        );
+
+        let new_dtb = dt.to_dtb().leak();


So you just put the new device tree somewhere on the heap of RITM and return that to the EL1 payload? Presumably that means that you'll have to map a page-aligned region containing it into the stage 2 pagetable so that EL1 can read it; is there likely to be any sensitive data in the heap that EL1 shouldn't be able to read?

This is what I'm doing in #16. In this PR, I just don't enable stage-2 page tables at all, so the guest should able to read everything in ritm's memory.

is there likely to be any sensitive data in the heap that EL1 shouldn't be able to read?

It doesn't at the moment, but given that RITM is supposed to be modified by the vendors, it potentially might contain sensitive data. Hence in #16 I only expose the device tree to the guest - so the guest cannot read anything it doesn't need.

src/main.rs

qwandor · 2026-02-05T01:37:33Z

src/hypervisor.rs

+
+    // Reset the timer offset.
+    // SAFETY: Resetting the CNTVOFF_EL2 is needed as part of
+    // preparing an environment to run Linux.


This is true, but doesn't explain why it is safe. I guess it's actually safe to write any value to this system register, as it doesn't affect anything to do with memory safety, right?

Correct - also, I see that arm_sysregs marks this register write-safe, so we don't even need the unsafe block anymore.

src/hypervisor.rs

qwandor · 2026-02-05T01:48:35Z

src/hypervisor.rs

+///
+/// This function is unsafe because it may call into the secure monitor via SMC or perform other
+/// privileged operations.
+unsafe fn handle_psci(fn_id: u64, arg0: u64, arg1: u64, arg2: u64) -> Result<u64, arm_psci::Error> {


I'm not entirely sure this needs to be unsafe, but if it does then so does try_handle_psci by the same argument.

Well, I think with the current state of the code, since we wrap the potentially unsafe PSCI calls, and we handle all others (possibly by panicking), it's most likely fine to drop the unsafe attribute from this function.

qwandor · 2026-02-05T02:12:58Z

src/hypervisor.rs

+    let context = SUSPEND_CONTEXTS
+        .lock()
+        .remove(&context.data.mpidr)
+        .expect("context not found for resuming CPU");


The fact that restore_from_suspend is being passed a mutable reference to the context from SUSPEND_CONTEXTS while the lock isn't held means that something dodgy is going on here, I guess with the aarch64-rt suspend API that you've added. It also seems superfluous, as the only thing you are getting from it here is the MPIDR, which you could just read from the system register anyway. I'm not sure what exactly we should do instead though.

Well, indeed it's unsound; presumably the aarch64-rt code should then pass the pointer to the callback rather than the reference (sadly making the API even more low-level), so that the callback must ensure it's okay to dereference it.

~~In this case, I think it's reasonable to read the mpidr from the system register and then access the data properly by locking the SpinMutex.~~

Okay, after consideration, I don't really think it's unsound as it is. Yes, the data is behind mutex, but after CPU-specific data is stored by the CPU, it never gets removed or otherwise modified by any other CPU. Hence, I believe it should be safe to read this after resuming, even without acquiring the lock. I'll update the safety comments with this info.

A completely safe alternative would be to store the data on the heap, but I believe that having this in a static global is better, even if unsafe.

I don't think there's any undefined behaviour as it stands, but I think it's still unsound in that adding safe code could introduce undefined behaviour.

I think the best solution is to make the _context passed to restore_from_suspend a pointer rather than a reference. That shouldn't be a problem here, as you're not using it anyway. You could even remove the parameter entirely.

I've used your latest changes in my aarch64-rt PR - the way we have it now seems to be perfectly sound.

qwandor · 2026-02-05T02:15:06Z

src/hypervisor.rs

+        "Restoring from suspend: entry={:#x}, ctx={:#x}",
+        context.data.entry_point, context.data.context_id
+    );
+


I think you might need to reset a bunch of EL1 system registers first.

Are you sure? My understanding is that the guest should expect essentially the same state as after CPU_ON, so using the same entry point function (entry_point_el1) should suffice.

Not totally sure. Certainly in both cases EL1 will expect the EL1 system registers to have their reset values, but it's not clear to me whether EL2, EL3 or hardware is responsible for ensuring this. For example is it possible that in some suspend states hardware will preserve system register values and so EL2 is responsible for resetting EL1 system registers? But maybe that doesn't happen, because EL3 will return from the suspend call in that case rather than going through the resume path.

Let's leave this as is for now, but talk to someone who understands PSCI better than I do.

qwandor · 2026-02-05T02:18:14Z

src/hypervisor.rs

+        &raw mut **stack
+    } else {
+        &raw mut **stack_map.insert(mpidr, Box::default())
+    }


What's the advantage of allocating secondary stacks dynamically on the heap, rather than reserving SECONDARY_STACK_PAGE_COUNT * (MAX_CORES - 1) pages statically? The disadvantage is that you'll only find out at runtime when starting one of the secondary cores that your heap wasn't big enough for all the stacks, whereas if you do it statically you'll find out at compile time.

Well, Box gives some confidence that writes from multiple CPUs are safe, but as long as we never remove the items, it should be sound to have this stored in a single array as well.

qwandor · 2026-02-05T02:20:43Z

src/hypervisor.rs

+        "mov x19, x0",
+        "bl {disable_mmu_and_caches}",
+        "mov x0, x19",
+        "eret",


Is it possible that some of the other general-purpose registers here contain sensitive EL2 data? If so, you should zero them all before the eret. The same applies to any other registers shared between EL2 and EL1.

Good call - and if I'm not mistaken, zeroing out the general purpose registers should suffice, as the rest is not shared with EL1 when EL1H is set.

src/arch.rs

src/main.rs

src/hypervisor.rs

qwandor · 2026-02-18T01:49:42Z

src/hypervisor.rs

+    let context = SUSPEND_CONTEXTS
+        .lock()
+        .remove(&context.data.mpidr)
+        .expect("context not found for resuming CPU");


I don't think there's any undefined behaviour as it stands, but I think it's still unsound in that adding safe code could introduce undefined behaviour.

I think the best solution is to make the _context passed to restore_from_suspend a pointer rather than a reference. That shouldn't be a problem here, as you're not using it anyway. You could even remove the parameter entirely.

qwandor · 2026-02-18T02:17:57Z

src/hypervisor.rs

+        "Restoring from suspend: entry={:#x}, ctx={:#x}",
+        context.data.entry_point, context.data.context_id
+    );
+


Not totally sure. Certainly in both cases EL1 will expect the EL1 system registers to have their reset values, but it's not clear to me whether EL2, EL3 or hardware is responsible for ensuring this. For example is it possible that in some suspend states hardware will preserve system register values and so EL2 is responsible for resetting EL1 system registers? But maybe that doesn't happen, because EL3 will return from the suspend call in that case rather than going through the resume path.

Let's leave this as is for now, but talk to someone who understands PSCI better than I do.

m4tx force-pushed the el1-support branch 2 times, most recently from 66d20eb to 4ede863 Compare November 26, 2025 15:24

m4tx force-pushed the el2-support branch from 738f4d1 to dadf701 Compare November 28, 2025 14:18

m4tx force-pushed the el1-support branch 3 times, most recently from 0dc830b to bbacedb Compare November 28, 2025 14:29

m4tx requested review from qperret2 and qwandor November 28, 2025 14:30

qwandor reviewed Dec 1, 2025

View reviewed changes

src/hypervisor/psci.rs Outdated Show resolved Hide resolved

qwandor reviewed Dec 1, 2025

View reviewed changes

m4tx force-pushed the el1-support branch from 7427e4e to d5c3aed Compare January 13, 2026 14:11

m4tx commented Jan 14, 2026

View reviewed changes

src/hypervisor.rs Outdated Show resolved Hide resolved

m4tx requested a review from qwandor January 14, 2026 15:16

m4tx mentioned this pull request Jan 14, 2026

Add warm_boot_entry for suspend support google/aarch64-rt#40

Open

qwandor reviewed Feb 5, 2026

View reviewed changes

m4tx force-pushed the el2-support branch from 6c13a59 to 602d3ec Compare February 5, 2026 11:12

m4tx added 2 commits February 5, 2026 11:42

Support booting the kernel in EL1 with a PSCI proxy

ea5f4fd

address review comments

0d61e51

m4tx force-pushed the el1-support branch from 9a4eddc to 0d61e51 Compare February 5, 2026 16:45

reformat

601b067

m4tx requested a review from qwandor February 5, 2026 16:47

qwandor reviewed Feb 18, 2026

View reviewed changes

m4tx added 3 commits February 20, 2026 12:56

use latest aarch64-rt

ea10313

address review comments

1e1c276

reformat

5a0689c

m4tx requested a review from qwandor February 20, 2026 13:20

Comments

Conversation

m4tx commented Nov 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m4tx Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m4tx Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m4tx Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m4tx Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

m4tx Feb 5, 2026 •

edited

Loading

m4tx Feb 5, 2026 •

edited

Loading

m4tx Feb 5, 2026 •

edited

Loading

m4tx Feb 5, 2026 •

edited

Loading