Skip to content

Conversation

@etterli
Copy link
Contributor

@etterli etterli commented Jan 14, 2026

This extends the OTBN simulator with the masking accelerator interface (MAI). This interface allows OTBN to offload A2B, B2A and secAdd operations to hardware accelerators via WSRs.

For now, the accelerators are simple dummy accelerators which simply unmask the secrets, do the operation and remask the secret with a fixed masked.

@etterli etterli changed the title Add masking accelerator interface to OTBN simulator [otbn] Add masking accelerator interface to OTBN simulator Jan 14, 2026
@etterli etterli force-pushed the otbn-masked-isa branch 2 times, most recently from daf7666 to 1db00a7 Compare January 14, 2026 15:52
Copy link
Member

@nasahlpa nasahlpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @etterli!

LGTM from an implementation perspective. Maybe @andrea-caforio can also have a look to see if this interfaces works for him.

0b00: A2B
0b01: B2A
0b10: secAdd
Invalid values and writing to these bits when MAI is busy will cause a MAI_ERROR software error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should enclosure MAI_ERROR in quotation marks or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe.. But this documentation has to change anyway. Currently the rendered text disregards any formatting here (everything is printed in a single line).

# and write it back.
if self.grs1 != 0:
new_val = old_val | bits_to_set
if self.csr == 0x7f0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an enum here instead of the hardcoded 0x7f0 value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply followed the style of the existing CSR address checks like in line 429. But yes, an enum would be cleaner

Comment on lines +97 to +100
if CHECK_ACCELERATOR_CONSTRAINTS:
assert self._modulus() < 2**32
assert 0 <= s < self._modulus()
assert 0 <= r < self._modulus()
assert 0 <= secret < self._modulus()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have this :)


# Check if MAI is ready to accept new inputs. If not stop with MAI
# error.
if self.wsr in [12, 13, 14, 15]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have an else branch here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the lines 1964-1295 are common for all WSRs. This check here simply aborts the instruction in case the MAI is not ready for inputs. But if the MAI is ready, we want to perform the update.

This abort works the same way as the abort above where we check whether the given WSR address exists (Line 1282).

self._all_accelerators = {
MaiOperation.A2B: A2BAccelerator(self.wsrs.MOD),
MaiOperation.B2A: B2AAccelerator(self.wsrs.MOD),
MaiOperation.SECADD: SecAddModqAccelerator(self.wsrs.MOD),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a SecAddMod2k accelerator instead of mod p. It is needed for the gadgets and
inside the A2B/B2A. @h-filali

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The secAddModq is nice to have since a A2B transfomation can be boiled down to a secAddModq but to accelerate ML-DSA we would need secAddMod2k.

@andrea-caforio
Copy link
Contributor

Some required changes:

  1. Replace the mod p adder with a mod 2^32 adder.
  2. Use a constant masks that are not zero.
  3. Make sure that Boolean shares are masked over all 32 bits.
  4. Make sure that Arithmetic shares are masked in mod p.

Use the definition from Section 2.1 here.

Copy link
Contributor

@rswarbrick rswarbrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some minor notes about the (first half of the) first commit. The structure looks sensible and I'm excited that the new feature can be added so cleanly!

Bit [ 0] MAI_BUSY: This bit is set to 1 when an MAI operation is in progress. If reset, a new execution can be started by writing to the MAI_START bit in the MAI_CTRL CSR.
Bit [ 1] MAI_READY: This bit is set to 1 when the MAI_INx_Sx WSRs are ready to accept new values for the next execution.
Bits [31:2] are reserved/ignored.
# TODO: Expand the documentation format such that bitfields can be described.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the todo here :-) The structure might actually make it easier to write the register documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I solved this TODO. See the new CSR documentation feature (#29165 )

Write to this CSR to begin a request to fill the RND cache.
Always reads as 0.

- name: mai_ctrl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this register supposed to have swaccess: "wo"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should also be able to read the currently set operation back.

A `KEY_INVALID` software error is raised on read if the Key Manager has not provided a valid key.

- name: mai_res_s0
address: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal, but was there a problem with addresses 8 and 9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 and 9 are already reserved for the new KMAC interface. See #29025 . This KMAC interface also has many overlapping stuff, so maybe worth looking at it as well..

doc: |
This WSR holds shares 0 of the masked results produced by the MAI.
The results are organized as 8 consecutive 32-bit values.
Results are valid when MAI is not busy anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might need something about when they get invalidated too? (the when the next operation starts?)

address: 10
read-only: true
doc: |
This WSR holds shares 0 of the masked results produced by the MAI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wants to be "share 0" (and similarly for the WSRs below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used shareS because this WSR actually contains multiple results / is vectorized.

Trace.hex_value(self.new_value, 32))


class CSR:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from the WSR class. Can you make it so they both extend some base class? I think the result would be shorter (for example, the default implementations of has_value, on_start could be shared)


def read_start_bit(self) -> bool:
'''Get the start bit from the CSR.'''
bit = (self.read_unsigned() >> self.START_BIT_OFFSET) & self.START_BIT_MASK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Can't you store the fields as separate variables? I think the end result would be much clearer, and it will make quite a lot of the code below simpler.

'''Returns whether the CSR value contains valid operation bits.'''
op = (value >> self.OPERATION_OFFSET) & self.OPERATION_MASK
# TODO: Clean this up once we have python 3.12+
return op in MaiOperation._value2member_map_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably easier just to use op <= SECADD? (Since the enum is dense)


def write_unsigned(self, value: int) -> None:
'''Ignore writes to the MAI STATUS CSR.
Note this is different to set_ methods. This is used by instructions and the set_ methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This should be "different from"


def read_ready_bit(self) -> bool:
'''Get the ready bit from the CSR.'''
bit = (self.read_unsigned() >> self.READY_BIT_OFFSET) & self.READY_BIT_MASK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this would be dramatically simpler if you just had a pair of booleans.

# If CSR should be updated, compute update, check if update is allowed
# and write it back.
if self.grs1 != 0:
new_val = old_val | bits_to_set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is possible to set the start bit with the algorithm bit, then this line is a bug because
it would not be possible to do an A2B after an ADD because bit 2 is stuck.

If this line is not a bug, then it is not possible to set both config bits at the same time.

Can you clarify? @etterli

Copy link
Contributor Author

@etterli etterli Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is exactly how CSRRS is defined by the RISC-V standard and OTBN follows this definition (see here). This instruction will set the bits of the CSR specified in grs1.

You can use the CSRRW instruction instead of the CSRRS instruction to set A2B after ADD and start it with the same instruction. It is not required to set the start bit with the CSRRS instruction, it is just required that the start bit is written to 1. The CSRRW will write the full 32-bit value in grs1 to the CSR.

So you should be able to start an A2B after an ADD with this:

addi x2, x0, 0x1       /* 0x1 = start bit set and the OPERATION field is 0b00 (= A2B) */
csrrw x0, x2, MAI_CTRL /* set and start the operation */

@andrea-caforio
Copy link
Contributor

Something broke after the latest force push.

Invalid schema in ISR YAML file at '/.opentitan/hw/ip/otbn/data/csr.yml': bit field 31-3 of ISR 'mai_ctrl' doesn't have the right keys. The following required fields were missing: values.

@etterli
Copy link
Contributor Author

etterli commented Jan 27, 2026

The Bazel BUILD file has to be updated with the new files

s = in0_s1
# We take a zero mask for simplicity and to avoid assertion errors due to modulus being
# smaller than this fixed constant.
r = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use non-zero mask.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated it.

@etterli
Copy link
Contributor Author

etterli commented Jan 28, 2026

This needs to be rebased once the new generalized CSR/WSR implementation from #29025 is ready

Signed-off-by: Pascal Etterli <[email protected]>
This adds the CSRs and WSRs required for the masking accelerator interface (MAI).
These register enable to move data from/to the accelerators and control them.

Signed-off-by: Pascal Etterli <[email protected]>
@andrea-caforio
Copy link
Contributor

Crazy idea: can you add artificial stalls into the three operations (e.g., 32 cycles, or make it a parameter)?
I want to see how slow ML-DSA sign becomes.


class A2BAccelerator(MaskingAccelerator):
def __init__(self, mod_wsr: DumbWSR):
super().__init__(8, mod_wsr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrea-caforio Regarding your question about stall cycles. Here you can define the latency of the accelerator (change it on L110 for B2A or on L143 for secAdd). This latency is static. Do you need an option to change the latency during an OTBN execution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Can you set it to 32 for all 3 operations for the moment? This is a upper bound, I hope
that the actual latency will be lower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

included in the newest force push (I did it also on the chaos branch)

This adds the simulator implementation of the masking accelerator interface
(MAI). The MAI allows the OTBN to offload A2B, B2A or secAdd computations to
hardened accelerators.

Signed-off-by: Pascal Etterli <[email protected]>
This simple program shows how to use the masking accelerator interface.

Signed-off-by: Pascal Etterli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants