You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/rationale.adoc
+12-1Lines changed: 12 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,4 +61,15 @@ For implementations with small ELEN (32), supporting Zvbc32e brings ISA support
61
61
Zvbc32e is also useful for implementations with ELEN >= 64, as it allows more efficient implementations of algorithms relying on 32-bit (or less) carryless multiplications.
62
62
Selecting only `Zvbc32e` allows implementations to save area while providing identical performance on those algorithms.
63
63
64
-
For all implementations, `Zvbc32e` allows better implementations (less instructions and more targeted use of hardware resources) of algorithms relying on 8-bit and 16-bit carryless multiplications (e.g. erasure coding).
64
+
For all implementations, `Zvbc32e` allows better implementations (less instructions and more targeted use of hardware resources) of algorithms relying on 8-bit and 16-bit carryless multiplications (e.g. erasure coding).
65
+
66
+
67
+
=== "Zvkgs" Extension for Vector-Scalar GCM/GHASH
68
+
69
+
One of the key use cases for the vector instructions `vghsh.vv` and `vgmul.vv` defined in <<Zvkg>> is to speed-up the Galois Counter Mode (GCM) cipher mode for a single encryption/decryption stream by computing the GHASH algorithm for multiple blocks of the same message in parallel (using the same symmetric key).
70
+
The parallel processing accumulates and multiplies multiple blocks of the message by the same power of `H` (`H` is the encryption of `0` by the cipher key).
71
+
The power being equal to the number of blocks processed in parallel.
72
+
The processing completes by reducing the parallel accumulators into a single output tag.
73
+
With `Zvkg` only, a full vector register was required to hold the multiple copies of the power of H.
74
+
`Zvkgs` reduces the size of the vector register group needed for powers of H: it just needs to contain a 128-bit wide element group, freeing some vector registers (The exact number of freed registers depends on VLEN and LMUL).
75
+
This exploits the same scalar element group broadcast mechanism used in other instructions defined in the vector crypto extensions (e.g. `vaesem.vs` from <<Zvkned>>).
0 commit comments