Skip to content

Conversation

@nwf
Copy link
Member

@nwf nwf commented Sep 11, 2025

This is an extremely untested thought for CHERIoT v2.

This is an extremely untested thought for CHERIoT v2.
Copy link
Collaborator

@vmurali vmurali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine using x0 loads/stores to force using an SCR.

But I think we should make it always refer to a single SCR, rather than depend on whether you have ASR permission or not. I guess you want to replace both AUICGP and simplify switcher spills? Switcher spill is actually not that complicated in my opinion and allows changing the implementation as we see fit in the future.

If you really want to have SCR-based loads and stores, you can create new SCR/CSR instructions that have new SCR indices (and one of the registers can be an offset) that translate to memory writes.

@nwf
Copy link
Member Author

nwf commented Sep 12, 2025

Replying a little bit out of order...

I guess you want to replace both AUICGP

Well, sort of, yes, depending on where in the "let's make changes to CHERIoT" story you start. The first vague proposal is to move cgp out to a CSR (mtidc in the standard draft's parlance) and free up the c3 GPR. In that world, AUICGP is a transfer from the mtidc CSR holding cgp to a GPR (with address arithmetic and bounds checks along the way) for use with subsequent GPR-authorized load or store instructions. In the current world, however, since cgp is just the c3 GPR, our AUICGP is basically solely used for its address arithmetic and can be completely elided if that isn't needed; but, if cgp is over in a CSR, then AUICGP becomes as necessary as AUIPCC... unless we also define a way for load/store instructions to use that CSR as a base. Thus the starting point of this additional (also vague) proposal.

and simplify switcher spills?

This is sort of a separate consideration, arising from the observation that the switcher never dereferences a compartment's cgp, and so all the c0-base load/store instructions that use mtidc are useless therein, and we really would like the switcher to be as tiny and fast as possible. But the switcher sure does make a number of fixed-offset references through trusted stack pointers, and hey, those are also in a CSR most of the time it's doing such things. (Not always, since we zero out the CSR when we've got IRQs deferred and we're running the scheduler pseudo-thread.)

As an aside: both mtidc- and mtdc-authorized loads and stores are almost surely also useless to the loader, the only other component that runs with PCC.ASR set, but we care less about its code density and/or performance, since it runs once per startup in the current world and is almost fully replaced by code that runs once per firmware upgrade if we ever get XIP.

Thinking about RISC-V mode switches / exception entry is enough to make one miss X86 (or Arm M). While it's true that a store-through-mtdc instruction lets us avoid the type-system-defying "atomic exchange with a CSR" dance in exception entry, RISC-V has other state (at least mstatus and mepc) that is precious on any exception entry, and so it's very difficult to imagine a world where we don't have to architecturally defer IRQs as part of mode switch, even in the very simple case of a horizontal synchronous trap. Alas.

If you really want to have SCR-based loads and stores, you can create new SCR/CSR instructions that have new SCR indices (and one of the registers can be an offset) that translate to memory writes.

New loads and stores eat up a lot of encoding space, and this bit of space is laying right here and otherwise useless by definition. Separately, register-displacement operations are otherwise absent in base RISC-V, so I don't think that'd be seen as a particularly friendly amendment. (This is where we chant "micro-op fusion", right?)

But I think we should make it always refer to a single SCR, rather than depend on whether you have ASR permission or not.

I suppose I'd be willing to gate which CSR is targeted by choice of register operand -- like we did with jalr; non-TCB code has absolutely no reason to store sp into cgp, for example -- but I confess I like that less than gating on PCC.ASR, and besides, CSR-related circuitry has to look at PCC.ASR anyway for access control.

I suppose the concern here is code that's intended to be ambivalent about whether its running with or without PCC.ASR? (Maybe we have to do something special when compiling the loader, which always runs with PCC.ASR but does have globals, unlike the switcher...)

All that said, thanks for the food for thought, as ever.

@vmurali
Copy link
Collaborator

vmurali commented Sep 12, 2025

New loads and stores eat up a lot of encoding space, and this bit of space is laying right here and otherwise useless by definition. Separately, register-displacement operations are otherwise absent in base RISC-V, so I don't think that'd be seen as a particularly friendly amendment. (This is where we chant "micro-op fusion", right?)

Discussed about this offline. What I meant is to hijack the SCR (or CSR in RISCV spec) space to encode new instructions (with a 5-bit offset instead of specifying a source or a destination register for loads or stores respectively). The problem in that, as we discussed, is that we only have a 5-bit offset, as opposed to a 12 bit offset.

I suppose the concern here is code that's intended to be ambivalent about whether its running with or without PCC.ASR?

Yes. The context of the code would suddenly matter, making it hard to reason about the code - especially when we start "lending" ASR capabilities to other code regions, instead of putting all ASR code into the switcher as we do currently. But ultimately, we do need to understand whether a code runs with ASR or not, to actually reason about it. So this ASR-dependency is not going to change that disposition, but nevertheless it makes it harder to reason about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants