@@ -302,13 +302,14 @@ all other `SEW` values are _reserved_.
302302| Instructions
303303| Required SEW
304304
305- | vaes* | 32
306- | Zvknha: vsha2* | 32
307- | Zvknhb: vsha2* | 32 or 64
308- | vclmul[h] | 64
309- | vg* | 32
310- | vsm3* | 32
311- | vsm4* | 32
305+ | vaes* | 32
306+ | Zvknha: vsha2* | 32
307+ | Zvknhb: vsha2* | 32 or 64
308+ | vclmul[h] (Zvbc) | 64
309+ | vclmul[h] (Zvbc32e) | 8, 16, 32
310+ | vg* | 32
311+ | vsm3* | 32
312+ | vsm4* | 32
312313
313314
314315|===
@@ -490,14 +491,14 @@ Note: If `Zve32x` is supported then `Zvkb` or `Zvbb` provide support for EEW of
490491
491492
492493All _cryptography-specific_ instructions defined in this Vector Crypto specification (i.e., those
493- in <<zvkned>>, <<zvknh,Zvknh[ab]>>, <<Zvkg>>, <<Zvksed>> and <<zvksh>> but _not_ <<zvbb>>,<<zvkb>>, or <<zvbc>>) shall
494+ in <<zvkned>>, <<zvknh,Zvknh[ab]>>, <<Zvkg>>, <<Zvksed>> and <<zvksh>> but _not_ <<zvbb>>, <<zvkb>>, <<zvbc>> or <<zvbc,Zvbc32e >>) shall
494495be executed with data-independent execution latency as defined in the
495496<<#crypto_scalar_instructions,RISC-V Scalar Cryptography Extensions specification>>.
496497It is important to note that the Vector Crypto instructions are independent of the
497498implementation of the `Zkt` extension and do not require that `Zkt` is implemented.
498499
499500This specification includes a <<Zvkt>> extension that, when implemented, requires certain vector instructions
500- (including <<zvbb>>, <<zvkb>>, and <<zvbc>>) to be executed with data-independent execution latency.
501+ (including <<zvbb>>, <<zvkb>>, <<zvbc,Zvbc>> and <<zvbc,Zvbc32e >>) to be executed with data-independent execution latency.
501502
502503Detection of individual cryptography extensions uses the
503504unified software-based RISC-V discovery method.
@@ -540,12 +541,16 @@ This extension is a superset of the <<Zvkb>> extension.
540541<<<
541542
542543[[zvbc,Zvbc]]
543- ==== `Zvbc` - Vector Carryless Multiplication
544+ ==== `Zvbc` and `Zvbc32e` - Vector Carryless Multiplication
544545
545546General purpose carryless multiplication instructions which are commonly used in cryptography
546547and hashing (e.g., Elliptic curve cryptography, GHASH, CRC).
547548
548- These instructions are only defined for `SEW`=64.
549+ When `Zvbc` is supported, the following instructions are defined for `SEW=64`.
550+ When `Zvbc32e` is supported, the instructions are defined for `SEW=8`, `16`, and `32`.
551+
552+ Note:: Zvbc and Zvbc32e can be implemented independently.
553+
549554
550555[%autowidth]
551556[%header,cols="^2,4"]
@@ -1056,7 +1061,7 @@ All <<Zvkb>> instructions are also covered by DIEL as they are a
10561061proper subset of <<Zvbb>>
10571062====
10581063
1059- ===== All <<Zvbc>> instructions
1064+ ===== All <<Zvbc>> and Zvbc32e instructions
10601065- vclmul[h].v[vx]
10611066
10621067===== add/sub
@@ -2213,7 +2218,9 @@ Encoding (Vector-Scalar)::
22132218]}
22142219....
22152220Reserved Encodings::
2216- * `SEW` is any value other than 64
2221+ * `SEW` is any value other than 64 (Zvbc)
2222+ * `SEW` is any value other than 8, 16 or 32 (Zvbc32e)
2223+
22172224
22182225Arguments::
22192226
@@ -2230,22 +2237,20 @@ Arguments::
22302237|===
22312238
22322239Description::
2233- Produces the low half of 128 -bit carry-less product.
2240+ Produces the low half of `2*SEW` -bit carry-less product.
22342241
2235- Each 64 -bit element in the `vs2` vector register is carry-less multiplied by
2236- either each 64 -bit element in `vs1` (vector-vector), or the 64 -bit value
2242+ Each `SEW` -bit element in the `vs2` vector register is carry-less multiplied by
2243+ either each `SEW` -bit element in `vs1` (vector-vector), or the `SEW` -bit value
22372244from integer register `rs1` (vector-scalar). The result is the least
2238- significant 64 bits of the carry-less product.
2245+ significant `SEW` bits of the carry-less product.
22392246
22402247[NOTE]
22412248====
2242- The 64-bit carryless multiply instructions can be used for implementing GCM in the absence of the `zvkg` extension.
2243- We do not make these instructions exclusive as the 64-bit carryless multiply is readily derived from the
2249+ The carryless multiply instructions can be used for implementing GCM in the absence of the `zvkg` extension.
2250+ We do not make these instructions exclusive as the carryless multiply is readily derived from the
22442251instructions in the `zvkg` extension and can have utility in other areas.
2245- Likewise, we treat other SEW values as reserved so as not to preclude
2246- future extensions from using this opcode with different element widths.
2247- For example, a future extension might define an `SEW`=32 version of this instruction to enable `Zve32*` implementations to have
2248- vector carryless multiplication instructions.
2252+
2253+ Zvbc32e allows Zve32x implementations to support vector carryless multiplication.
22492254====
22502255
22512256Operation::
@@ -2256,10 +2261,10 @@ Operation::
22562261function clause execute (VCLMUL(vs2, vs1, vd, suffix)) = {
22572262
22582263 foreach (i from vstart to vl-1) {
2259- let op1 : bits (64 ) = if suffix =="vv" then get_velem(vs1,i)
2264+ let op1 : bits (SEW ) = if suffix =="vv" then get_velem(vs1, i)
22602265 else zext_or_truncate_to_sew(X(vs1));
2261- let op2 : bits (64 ) = get_velem(vs2,i);
2262- let product : bits (64 ) = clmul(op1,op2,SEW);
2266+ let op2 : bits (SEW ) = get_velem(vs2, i);
2267+ let product : bits (SEW ) = clmul(op1, op2, SEW);
22632268 set_velem(vd, i, product);
22642269 }
22652270 RETIRE_SUCCESS
@@ -2272,10 +2277,12 @@ function clmul(x, y, width) = {
22722277 }
22732278 result
22742279}
2280+
2281+
22752282--
22762283
22772284Included in::
2278- <<zvbc>>, <<zvknc>>, <<zvksc>>
2285+ <<zvbc>>, <<zvbc,Zvbc32e>>, << zvknc>>, <<zvksc>>
22792286
22802287<<<
22812288
@@ -2317,7 +2324,8 @@ Encoding (Vector-Scalar)::
23172324]}
23182325....
23192326Reserved Encodings::
2320- * `SEW` is any value other than 64
2327+ * `SEW` is any value other than 64 (Zvbc)
2328+ * `SEW` is any value other than 8, 16 or 32 (Zvbc32e)
23212329
23222330Arguments::
23232331
@@ -2334,12 +2342,12 @@ Arguments::
23342342|===
23352343
23362344Description::
2337- Produces the high half of 128 -bit carry-less product.
2345+ Produces the high half of `2*SEW` -bit carry-less product.
23382346
2339- Each 64 -bit element in the `vs2` vector register is carry-less multiplied by
2340- either each 64 -bit element in `vs1` (vector-vector), or the 64 -bit value
2347+ Each `SEW` -bit element in the `vs2` vector register is carry-less multiplied by
2348+ either each `SEW` -bit element in `vs1` (vector-vector), or the `SEW` -bit value
23412349from integer register `rs1` (vector-scalar). The result is the most
2342- significant 64 bits of the carry-less product.
2350+ significant `SEW` bits of the carry-less product.
23432351
23442352// This instruction must always be implemented such that its execution latency does not depend
23452353// on the data being operated upon.
@@ -2348,12 +2356,11 @@ Operation::
23482356[source,sail]
23492357--
23502358function clause execute (VCLMULH(vs2, vs1, vd, suffix)) = {
2351-
23522359 foreach (i from vstart to vl-1) {
2353- let op1 : bits (64 ) = if suffix =="vv" then get_velem(vs1,i)
2360+ let op1 : bits (SEW ) = if suffix =="vv" then get_velem(vs1,i)
23542361 else zext_or_truncate_to_sew(X(vs1));
2355- let op2 : bits (64 ) = get_velem(vs2, i);
2356- let product : bits (64 ) = clmulh(op1, op2, SEW);
2362+ let op2 : bits (SEW ) = get_velem(vs2, i);
2363+ let product : bits (SEW ) = clmulh(op1, op2, SEW);
23572364 set_velem(vd, i, product);
23582365 }
23592366 RETIRE_SUCCESS
@@ -2366,11 +2373,10 @@ function clmulh(x, y, width) = {
23662373 }
23672374 result
23682375}
2369-
23702376--
23712377
23722378Included in::
2373- <<zvbc>>, <<zvknc>>, <<zvksc>>
2379+ <<zvbc>>, <<zvbc,Zvbc32e>>, << zvknc>>, <<zvksc>>
23742380
23752381<<<
23762382
0 commit comments