Integrating Zvkgs specification inside Zvkg chapter

nibrunie · nibrunie · commit fd1c787aed9d · 2025-11-09T11:00:55.000-08:00
diff --git a/src/vector-crypto.adoc b/src/vector-crypto.adoc
@@ -491,7 +491,7 @@ Note: If `Zve32x` is supported then `Zvkb` or `Zvbb` provide support for EEW of
 
 
 All _cryptography-specific_ instructions defined in this Vector Crypto specification (i.e., those
-in <<zvkned>>, <<zvknh,Zvknh[ab]>>, <<Zvkg>>, <<Zvksed>> and <<zvksh>> but _not_ <<zvbb>>, <<zvkb>>, <<zvbc>> or <<zvbc,Zvbc32e>>) shall
+in <<zvkned>>, <<zvknh,Zvknh[ab]>>, <<Zvkg>>, <<Zvkg,Zvkgs>>, <<Zvksed>> and <<zvksh>> but _not_ <<zvbb>>, <<zvkb>>, <<zvbc>> or <<zvbc,Zvbc32e>>) shall
 be executed with data-independent execution latency as defined in the
 <<#crypto_scalar_instructions,RISC-V Scalar Cryptography Extensions specification>>.
 It is important to note that the Vector Crypto instructions are independent of the
@@ -600,11 +600,15 @@ in the Zvbb extension: vbrev.v, vclz.v, vctz.v, vcpop.v, and vwsll.[vv,vx,vi].
 <<<
 
 [[zvkg,Zvkg]]
-==== `Zvkg` - Vector GCM/GMAC
+==== `Zvkg` and `Zvkgs` - Vector GCM/GMAC
 
 Instructions to enable the efficient implementation of GHASH~H~ which is used in Galois/Counter Mode (GCM) and
 Galois Message Authentication Code (GMAC).
 
+Zvkg defines the vector-vector (.vv) versions of the instructions.
+Zvkgs defines the vector-scalar (.vs) versions of the instructions.
+Zvkgs depends on Zvkg.
+
 All of these instructions work on 128-bit element groups comprised of four 32-bit elements.
 
 GHASH~H~ is defined in the
@@ -635,8 +639,8 @@ Likewise, `vstart` must be a multiple of `EGS=4`.
 |EGW
 |Mnemonic
 |Instruction
-| 32 | 128 | vghsh.vv | <<insns-vghsh>>
-| 32 | 128 | vgmul.vv | <<insns-vgmul>>
+| 32 | 128 | vghsh.[vv,vs] | <<insns-vghsh>>
+| 32 | 128 | vgmul.[vv,vs] | <<insns-vgmul>>
 
 |===
 
@@ -880,7 +884,7 @@ This extension is shorthand for the following set of other extensions:
 
 [NOTE]
 ====
-While Zvkg and Zvbc are not part of this extension, it is recommended that at least one of them is implemented with this extension to enable efficient AES-GCM.
+While Zvkg, Zvkgs and Zvbc are not part of this extension, it is recommended that at least one of them is implemented with this extension to enable efficient AES-GCM.
 ====
 
 <<<
@@ -955,7 +959,7 @@ This extension is shorthand for the following set of other extensions:
 
 [NOTE]
 ====
-While Zvkg and Zvbc are not part of this extension, it is recommended that at least one of them is implemented with this extension to enable efficient SM4-GCM.
+While Zvkg, Zvkgs and Zvbc are not part of this extension, it is recommended that at least one of them is implemented with this extension to enable efficient SM4-GCM.
 ====
 
 <<<
@@ -2559,15 +2563,16 @@ Included in::
 <<<
 
 [[insns-vghsh, Vector GHASH Add-Multiply]]
-==== vghsh.vv
+==== vghsh.[vv,vs]
 
 Synopsis::
 Vector Add-Multiply over GHASH Galois-Field
 
 Mnemonic::
-vghsh.vv vd, vs2, vs1
+vghsh.vv vd, vs2, vs1 +
+vghsh.vs vd, vs2, vs1
 
-Encoding::
+Encoding (Vector-Vector)::
 [wavedrom, , svg]
 ....
 {reg:[
@@ -2580,8 +2585,26 @@ Encoding::
 {bits: 6, name: '101100'},
 ]}
 ....
+
+Encoding (Vector-Scalar)::
+[wavedrom, , svg]
+....
+{reg:[
+{bits: 7, name: 'OP-VE'},
+{bits: 5, name: 'vd'},
+{bits: 3, name: 'OPMVV'},
+{bits: 5, name: 'vs1'},
+{bits: 5, name: 'vs2'},
+{bits: 1, name: '1'},
+{bits: 6, name: '100011'},
+]}
+....
+
+
+
 Reserved Encodings::
 * `SEW` is any value other than 32
+* Only for the `.vs` form: the `vd` register group overlaps the `vs2` scalar element group
 
 Arguments::
 
@@ -2604,6 +2627,11 @@ Arguments::
 Description::
 A single "iteration" of the GHASH~H~ algorithm is performed.
 
+The previous partial hashes are read as 4-element groups from `vd`,
+the cipher texts are read as 4-element groups from `vs1`
+ and the hash subkey is read either as 4-element groups from `vs2` (`vghsh.vv`) or from the scalar 4-element group in `vs2` (`vghsh.vs`).
+The resulting partial hashes are written as 4-element groups into `vd`.
+
 This instruction treats all of the inputs and outputs as 128-bit polynomials and
 performs operations over GF[2].
 It produces the next partial hash (Y~i+1~) by adding the current partial
@@ -2634,12 +2662,6 @@ with the NIST specification. These reversals are inexpensive to implement as the
 swap bit positions and therefore do not require any logic.
 ====
 
-[NOTE]
-====
-Since the same hash subkey `H` will typically be used repeatedly on a given message,
-a future extension might define a vector-scalar version of this instruction where
-`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1.
-====
 
 Operation::
 [source,pseudocode]
@@ -2655,9 +2677,10 @@ function clause execute (VGHSH(vs2, vs1, vd)) = {
   eg_start = (vstart/EGS)
 
   foreach (i from eg_start to eg_len-1) {
+    let hindex = if suffix=="vv" then i else 0;
     let Y = (get_velem(vd,EGW=128,i));  // current partial-hash
     let X = get_velem(vs1,EGW=128,i);  // block cipher output
-    let H = brev8(get_velem(vs2,EGW=128,i)); // Hash subkey
+    let H = brev8(get_velem(vs2,EGW=128,hindex)); // Hash subkey
 
     let Z : bits(128) = 0;
 
@@ -2681,21 +2704,25 @@ function clause execute (VGHSH(vs2, vs1, vd)) = {
 }
 --
 
-Included in::
+`vghsh.vv` is included in::
 <<zvkg>>, <<zvkng>>, <<zvksg>>
 
+`vghsh.vs` is included in::
+<<zvkg,Zvkgs>>
+
 <<<
 
 [[insns-vgmul, Vector GHASH Multiply]]
-==== vgmul.vv
+==== vgmul.[vv,vs]
 
 Synopsis::
 Vector Multiply over GHASH Galois-Field
 
 Mnemonic::
-vgmul.vv vd, vs2
+vgmul.vv vd, vs2 +
+vgmul.vs vd, vs2
 
-Encoding::
+Encoding (Vector-Vector)::
 [wavedrom, , svg]
 ....
 {reg:[
@@ -2708,8 +2735,24 @@ Encoding::
 {bits: 6, name: '101000'},
 ]}
 ....
+
+Encoding (Vector-Scalar)::
+[wavedrom, , svg]
+....
+{reg:[
+{bits: 7, name: 'OP-VE'},
+{bits: 5, name: 'vd'},
+{bits: 3, name: 'OPMVV'},
+{bits: 5, name: '10001'},
+{bits: 5, name: 'vs2'},
+{bits: 1, name: '1'},
+{bits: 6, name: '101001'},
+]}
+....
+
 Reserved Encodings::
 * `SEW` is any value other than 32
+* Only for the `.vs` form: the `vd` register group overlaps the `vs2` scalar element group
 
 Arguments::
 
@@ -2731,6 +2774,11 @@ Arguments::
 Description::
 A GHASH~H~ multiply is performed.
 
+
+The multipliers are read as 4-element groups from `vd`,
+ the multiplicand subkey is read either as 4-element groups from `vs2` (`vgmul.vv`) or from the scalar element group in `vs2` (`vgmul.vs`).
+The resulting products are written as 4-element groups into `vd`.
+
 This instruction treats all of the inputs and outputs as 128-bit polynomials and
 performs operations over GF[2].
 It produces the product over GF(2^128^) of the two 128-bit inputs.
@@ -2755,20 +2803,14 @@ with the NIST specification. These reversals are inexpensive to implement as the
 swap bit positions and therefore do not require any logic.
 ====
 
-[NOTE]
-====
-Since the same multiplicand will typically be used repeatedly on a given message,
-a future extension might define a vector-scalar version of this instruction where
-`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1.
-====
 
 [NOTE]
 ====
-This instruction is identical to `vghsh.vv` with vs1=0.
+This instruction is identical to `vghsh.vv` (respectively `vghsh.vs`) with vs1=0.
 This instruction is often used in GHASH code. In some cases it is followed
 by an XOR to perform a multiply-add. Implementations may choose to fuse these
 two instructions to improve performance on GHASH code that
-doesn't use the add-multiply form of the `vghsh.vv` instruction.
+doesn't use the add-multiply form of the `vghsh.[vv,vs]` instruction.
 ====
 
 
@@ -2786,8 +2828,9 @@ function clause execute (VGMUL(vs2, vs1, vd)) = {
   eg_start = (vstart/EGS)
 
   foreach (i from eg_start to eg_len-1) {
+    let hindex = if suffix=="vv" then i else 0;
     let Y = brev8(get_velem(vd,EGW=128,i));  // Multiplier
-    let H = brev8(get_velem(vs2,EGW=128,i)); // Multiplicand
+    let H = brev8(get_velem(vs2,EGW=128,hindex)); // Multiplicand
     let Z : bits(128) = 0;
 
     for (int bit = 0; bit < 128; bit++) {
@@ -2809,9 +2852,12 @@ function clause execute (VGMUL(vs2, vs1, vd)) = {
 }
 --
 
-Included in::
+`vgmul.vv` included in::
 <<zvkg>>, <<zvkng>>, <<zvksg>>
 
+`vgmul.vs` included in::
+<<zvkg, Zvkgs>>
+
 <<<
 
 [[insns-vrev8, Vector Reverse Bytes]]
@@ -4402,7 +4448,7 @@ Crypto Vector instructions except Zvbb and Zvbc
 |100000||||| 100000 |V| | vsm3me      | 100000 | | |
 | 100001 | | | |            | 100001 |V| | vsm4k.vi    | 100001 | | |
 | 100010 | | | |            | 100010 |V| | vaeskf1.vi  | 100010 | | |
-| 100011 | | | |            | 100011 | | |             | 100011 | | |
+| 100011 | | | |            | 100011 | | | vghsh.vs    | 100011 | | |
 | 100100 | | | |            | 100100 | | |             | 100100 | | |
 | 100101 | | | |            | 100101 | | |             | 100101 | | |
 | 100110 | | | |            | 100110 | | |             | 100110 | | |
@@ -4412,7 +4458,7 @@ Crypto Vector instructions except Zvbb and Zvbc
 | 101001 | | | |            | 101001 |V| | *VAES.vs*   | 101001 | | |
 | 101010 | | | |            | 101010 |V| | vaeskf2.vi  | 101010 | | |
 | 101011 | | | |            | 101011 |V| | vsm3c.vi    | 101011 | | |
-| 101100 | | | |            | 101100 |V| | vghsh      | 101100 | | |
+| 101100 | | | |            | 101100 |V| | vghsh.vv   | 101100 | | |
 | 101101 | | | |            | 101101 |V| | vsha2ms     | 101101 | | |
 | 101110 | | | |            | 101110 |V| | vsha2ch     | 101110 | | |
 | 101111 | | | |            | 101111 |V| | vsha2cl     | 101111 | | |