Adding rationale for Zvbc32e

nibrunie · nibrunie · commit 3e0efea77b08 · 2025-11-09T11:00:55.000-08:00
diff --git a/src/rationale.adoc b/src/rationale.adoc
@@ -43,3 +43,22 @@ Two conditional-zero instructions are included: one that writes zero if the
 comparand is zero, and one that does so if the comparand is nonzero.
 Variants that perform magnitude comparisons with zero were considered but
 ultimately excluded for insufficient quantitative justification.
+
+=== "Zvbc32e" Extension for Vector Carryless Multiplication for `SEW <= 32`
+
+
+<<Zvbc>> defines vector carryless multiplication instructions for SEW=64 only.
+It is not suitable for implementations with small ELEN (32) and incur some inefficiencies for algorithms were at least one of the multiplication operands is limited to 32 bits (or less).
+The list of such algorithms includes the CLM-based folding algorithm used to compute the widespread 32-bit CRCs (e.g. Ethernet CRC)
+With `Zvbc`, only half the 64-bit element multiplication provided is exploited.
+This is due to the fact that CRC acceleration based on carryless multiplication often relies on a product term which is a polynomial modulo the CRC.
+This limits the size of this term to the output size of the CRC.
+
+Zvbc32e's defines the same vector carryless multiplication operations as Zvbc but on smaller SEW values (32, 16, and 8 bits).
+It can be leveraged by implementations with any ELEN value >= 32.
+For implementations with small ELEN (32), supporting Zvbc32e brings ISA support for vector carryless multiplication (which was not possible through Zvbc alone).
+
+Zvbc32e is also useful for implementations with ELEN >= 64, as it allows more efficient implementations of algorithms relying on 32-bit (or less) carryless multiplications.
+Selecting only `Zvbc32e` allows implementations to save area while providing identical performance on those algorithms.
+
+For all implementations, `Zvbc32e` allows better implementations (less instructions and more targeted use of hardware resources) of algorithms relying on 8-bit and 16-bit carryless multiplications (e.g. erasure coding).