Skip to content

Commit 6d2a882

Browse files
committed
Adding rationale for Zvkgs
1 parent fd1c787 commit 6d2a882

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

src/rationale.adoc

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,4 +61,15 @@ For implementations with small ELEN (32), supporting Zvbc32e brings ISA support
6161
Zvbc32e is also useful for implementations with ELEN >= 64, as it allows more efficient implementations of algorithms relying on 32-bit (or less) carryless multiplications.
6262
Selecting only `Zvbc32e` allows implementations to save area while providing identical performance on those algorithms.
6363

64-
For all implementations, `Zvbc32e` allows better implementations (less instructions and more targeted use of hardware resources) of algorithms relying on 8-bit and 16-bit carryless multiplications (e.g. erasure coding).
64+
For all implementations, `Zvbc32e` allows better implementations (less instructions and more targeted use of hardware resources) of algorithms relying on 8-bit and 16-bit carryless multiplications (e.g. erasure coding).
65+
66+
67+
=== "Zvkgs" Extension for Vector-Scalar GCM/GHASH
68+
69+
One of the key use cases for the vector instructions `vghsh.vv` and `vgmul.vv` defined in <<Zvkg>> is to speed-up the Galois Counter Mode (GCM) cipher mode for a single encryption/decryption stream by computing the GHASH algorithm for multiple blocks of the same message in parallel (using the same symmetric key).
70+
The parallel processing accumulates and multiplies multiple blocks of the message by the same power of `H` (`H` is the encryption of `0` by the cipher key).
71+
The power being equal to the number of blocks processed in parallel.
72+
The processing completes by reducing the parallel accumulators into a single output tag.
73+
With `Zvkg` only, a full vector register was required to hold the multiple copies of the power of H.
74+
`Zvkgs` reduces the size of the vector register group needed for powers of H: it just needs to contain a 128-bit wide element group, freeing some vector registers (The exact number of freed registers depends on VLEN and LMUL).
75+
This exploits the same scalar element group broadcast mechanism used in other instructions defined in the vector crypto extensions (e.g. `vaesem.vs` from <<Zvkned>>).

0 commit comments

Comments
 (0)