-
Notifications
You must be signed in to change notification settings - Fork 946
[otbn] Add masking accelerator interface to OTBN simulator #29109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
89d1cc1 to
8c937e1
Compare
daf7666 to
1db00a7
Compare
nasahlpa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @etterli!
LGTM from an implementation perspective. Maybe @andrea-caforio can also have a look to see if this interfaces works for him.
hw/ip/otbn/README.md
Outdated
| 0b00: A2B | ||
| 0b01: B2A | ||
| 0b10: secAdd | ||
| Invalid values and writing to these bits when MAI is busy will cause a MAI_ERROR software error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should enclosure MAI_ERROR in quotation marks or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe.. But this documentation has to change anyway. Currently the rendered text disregards any formatting here (everything is printed in a single line).
| # and write it back. | ||
| if self.grs1 != 0: | ||
| new_val = old_val | bits_to_set | ||
| if self.csr == 0x7f0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use an enum here instead of the hardcoded 0x7f0 value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply followed the style of the existing CSR address checks like in line 429. But yes, an enum would be cleaner
| if CHECK_ACCELERATOR_CONSTRAINTS: | ||
| assert self._modulus() < 2**32 | ||
| assert 0 <= s < self._modulus() | ||
| assert 0 <= r < self._modulus() | ||
| assert 0 <= secret < self._modulus() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to have this :)
|
|
||
| # Check if MAI is ready to accept new inputs. If not stop with MAI | ||
| # error. | ||
| if self.wsr in [12, 13, 14, 15]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we have an else branch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because the lines 1964-1295 are common for all WSRs. This check here simply aborts the instruction in case the MAI is not ready for inputs. But if the MAI is ready, we want to perform the update.
This abort works the same way as the abort above where we check whether the given WSR address exists (Line 1282).
hw/ip/otbn/dv/otbnsim/sim/mai.py
Outdated
| self._all_accelerators = { | ||
| MaiOperation.A2B: A2BAccelerator(self.wsrs.MOD), | ||
| MaiOperation.B2A: B2AAccelerator(self.wsrs.MOD), | ||
| MaiOperation.SECADD: SecAddModqAccelerator(self.wsrs.MOD), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a SecAddMod2k accelerator instead of mod p. It is needed for the gadgets and
inside the A2B/B2A. @h-filali
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The secAddModq is nice to have since a A2B transfomation can be boiled down to a secAddModq but to accelerate ML-DSA we would need secAddMod2k.
|
Some required changes:
Use the definition from Section 2.1 here. |
1db00a7 to
5be3760
Compare
rswarbrick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some minor notes about the (first half of the) first commit. The structure looks sensible and I'm excited that the new feature can be added so cleanly!
hw/ip/otbn/data/csr.yml
Outdated
| Bit [ 0] MAI_BUSY: This bit is set to 1 when an MAI operation is in progress. If reset, a new execution can be started by writing to the MAI_START bit in the MAI_CTRL CSR. | ||
| Bit [ 1] MAI_READY: This bit is set to 1 when the MAI_INx_Sx WSRs are ready to accept new values for the next execution. | ||
| Bits [31:2] are reserved/ignored. | ||
| # TODO: Expand the documentation format such that bitfields can be described. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the todo here :-) The structure might actually make it easier to write the register documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I solved this TODO. See the new CSR documentation feature (#29165 )
| Write to this CSR to begin a request to fill the RND cache. | ||
| Always reads as 0. | ||
|
|
||
| - name: mai_ctrl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this register supposed to have swaccess: "wo"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should also be able to read the currently set operation back.
| A `KEY_INVALID` software error is raised on read if the Key Manager has not provided a valid key. | ||
|
|
||
| - name: mai_res_s0 | ||
| address: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No big deal, but was there a problem with addresses 8 and 9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 and 9 are already reserved for the new KMAC interface. See #29025 . This KMAC interface also has many overlapping stuff, so maybe worth looking at it as well..
| doc: | | ||
| This WSR holds shares 0 of the masked results produced by the MAI. | ||
| The results are organized as 8 consecutive 32-bit values. | ||
| Results are valid when MAI is not busy anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might need something about when they get invalidated too? (the when the next operation starts?)
hw/ip/otbn/data/wsr.yml
Outdated
| address: 10 | ||
| read-only: true | ||
| doc: | | ||
| This WSR holds shares 0 of the masked results produced by the MAI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wants to be "share 0" (and similarly for the WSRs below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used shareS because this WSR actually contains multiple results / is vectorized.
| Trace.hex_value(self.new_value, 32)) | ||
|
|
||
|
|
||
| class CSR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from the WSR class. Can you make it so they both extend some base class? I think the result would be shorter (for example, the default implementations of has_value, on_start could be shared)
|
|
||
| def read_start_bit(self) -> bool: | ||
| '''Get the start bit from the CSR.''' | ||
| bit = (self.read_unsigned() >> self.START_BIT_OFFSET) & self.START_BIT_MASK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Can't you store the fields as separate variables? I think the end result would be much clearer, and it will make quite a lot of the code below simpler.
| '''Returns whether the CSR value contains valid operation bits.''' | ||
| op = (value >> self.OPERATION_OFFSET) & self.OPERATION_MASK | ||
| # TODO: Clean this up once we have python 3.12+ | ||
| return op in MaiOperation._value2member_map_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably easier just to use op <= SECADD? (Since the enum is dense)
hw/ip/otbn/dv/otbnsim/sim/csr.py
Outdated
|
|
||
| def write_unsigned(self, value: int) -> None: | ||
| '''Ignore writes to the MAI STATUS CSR. | ||
| Note this is different to set_ methods. This is used by instructions and the set_ methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This should be "different from"
|
|
||
| def read_ready_bit(self) -> bool: | ||
| '''Get the ready bit from the CSR.''' | ||
| bit = (self.read_unsigned() >> self.READY_BIT_OFFSET) & self.READY_BIT_MASK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, this would be dramatically simpler if you just had a pair of booleans.
| # If CSR should be updated, compute update, check if update is allowed | ||
| # and write it back. | ||
| if self.grs1 != 0: | ||
| new_val = old_val | bits_to_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is possible to set the start bit with the algorithm bit, then this line is a bug because
it would not be possible to do an A2B after an ADD because bit 2 is stuck.
If this line is not a bug, then it is not possible to set both config bits at the same time.
Can you clarify? @etterli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is exactly how CSRRS is defined by the RISC-V standard and OTBN follows this definition (see here). This instruction will set the bits of the CSR specified in grs1.
You can use the CSRRW instruction instead of the CSRRS instruction to set A2B after ADD and start it with the same instruction. It is not required to set the start bit with the CSRRS instruction, it is just required that the start bit is written to 1. The CSRRW will write the full 32-bit value in grs1 to the CSR.
So you should be able to start an A2B after an ADD with this:
addi x2, x0, 0x1 /* 0x1 = start bit set and the OPERATION field is 0b00 (= A2B) */
csrrw x0, x2, MAI_CTRL /* set and start the operation */
5be3760 to
3540129
Compare
b6c7766 to
48e7cb6
Compare
|
Something broke after the latest force push.
|
48e7cb6 to
71fb95e
Compare
|
The Bazel BUILD file has to be updated with the new files |
hw/ip/otbn/dv/otbnsim/sim/mai.py
Outdated
| s = in0_s1 | ||
| # We take a zero mask for simplicity and to avoid assertion errors due to modulus being | ||
| # smaller than this fixed constant. | ||
| r = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use non-zero mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I updated it.
71fb95e to
1f6e6f5
Compare
|
This needs to be rebased once the new generalized CSR/WSR implementation from #29025 is ready |
Signed-off-by: Pascal Etterli <[email protected]>
This adds the CSRs and WSRs required for the masking accelerator interface (MAI). These register enable to move data from/to the accelerators and control them. Signed-off-by: Pascal Etterli <[email protected]>
1f6e6f5 to
de0deb3
Compare
|
Crazy idea: can you add artificial stalls into the three operations (e.g., 32 cycles, or make it a parameter)? |
hw/ip/otbn/dv/otbnsim/sim/mai.py
Outdated
|
|
||
| class A2BAccelerator(MaskingAccelerator): | ||
| def __init__(self, mod_wsr: DumbWSR): | ||
| super().__init__(8, mod_wsr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrea-caforio Regarding your question about stall cycles. Here you can define the latency of the accelerator (change it on L110 for B2A or on L143 for secAdd). This latency is static. Do you need an option to change the latency during an OTBN execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Can you set it to 32 for all 3 operations for the moment? This is a upper bound, I hope
that the actual latency will be lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
included in the newest force push (I did it also on the chaos branch)
This adds the simulator implementation of the masking accelerator interface (MAI). The MAI allows the OTBN to offload A2B, B2A or secAdd computations to hardened accelerators. Signed-off-by: Pascal Etterli <[email protected]>
This simple program shows how to use the masking accelerator interface. Signed-off-by: Pascal Etterli <[email protected]>
de0deb3 to
a667a24
Compare
This extends the OTBN simulator with the masking accelerator interface (MAI). This interface allows OTBN to offload A2B, B2A and secAdd operations to hardware accelerators via WSRs.
For now, the accelerators are simple dummy accelerators which simply unmask the secrets, do the operation and remask the secret with a fixed masked.