[HW] Add HWVectorization pass #9222

mafeguimaraes · 2025-11-10T17:58:00Z

This patch introduces the HWVectorization pass, which identifies bitwise patterns in hardware modules that can be represented as vectorized operations instead of per-bit logic.
The pass aims to simplify the IR by grouping related scalar bit operations (such as comb.extract and comb.concat) into higher-level vector constructs like comb.reverse, comb.replicate, or direct multi-bit comb.and, comb.or, and comb.xor.

The pass scans each hw.module and identifies groups of bit-level operations that can be merged into vector-level constructs. This version supports several key patterns based on bit-level dataflow analysis and structural analysis.

This patch was co-authored by @RosaUlisses.

Supported transformations include:

1. Linear concatenations (identity):

Pattern: Bits are extracted in ascending order (identity permutation) and concatenated.
Transformation: The entire comb.concat chain is replaced with the original input vector.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%concat = comb.concat %3, %2, %1, %0 : i1, i1, i1, i1
hw.output %concat : i4

// After
hw.output %in : i4

2. Bit reversal:

Pattern: Bits are extracted in descending (reverse) order and concatenated.
Transformation: The chain is replaced with a single comb.reverse.

// Before
%0 = comb.extract %in from 0 : (i4) -> i1
%1 = comb.extract %in from 1 : (i4) -> i1
%2 = comb.extract %in from 2 : (i4) -> i1
%3 = comb.extract %in from 3 : (i4) -> i1
%rev = comb.concat %0, %1, %2, %3 : i1, i1, i1, i1
hw.output %rev : i4

// After
%0 = comb.reverse %in : i4
hw.output %0 : i4

3. Structural Patterns (e.g., Vectorized Mux)

Pattern: Isomorphic, bit-parallel logic cones are detected. For example, a scalarized mux structure that uses a replicated i1 control signal for each bit.
Transformation: The replicated scalar operations are collapsed into equivalent vector-level operations (e.g., comb.replicate, comb.and, comb.xor, comb.or).

// Before (scalarized mux)
%sel_inv = comb.xor %sel, %true : i1
%and_a = comb.and %a, %sel : i1
%and_b = comb.and %b, %sel_inv : i1
%mux = comb.or %and_a, %and_b : i1
...
(repeated for each bit)

// After (vectorized mux)
%true = hw.constant true
%sel_vec = comb.replicate %sel : (i1) -> i4
%a_masked = comb.and %a, %sel_vec : i4
%sel_inv_vec = comb.xor %sel_vec, (comb.replicate %true) : i4
%b_masked = comb.and %b, %sel_inv_vec : i4
%mux = comb.or %a_masked, %b_masked : i4
hw.output %mux : i4

4. Partial Vectorization (Chunking):

Pattern: The pass identifies contiguous sub-ranges (chunks) that can be vectorized independently, even if the entire bus cannot be.
Transformation: The pass vectorizes the identifiable chunks (e.g., a linear chunk) and leaves the remaining scalar or structural logic as another chunk, then concatenates the chunks back together.

// Before (Mixed linear and structural patterns)
// out[3:1] = in[3:1] (linear)
// out[0]   = in[1] ^ in[0] (structural)
%in_3 = comb.extract %in from 3 : (i4) -> i1
%in_2 = comb.extract %in from 2 : (i4) -> i1
%in_1 = comb.extract %in from 1 : (i4) -> i1
// Logic for bit 0
%in_1_for_0 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%bit_0 = comb.xor %in_1_for_0, %in_0 : i1
// Final concatenation
%concat = comb.concat %in_3, %in_2, %in_1, %bit_0 : i1, i1, i1, i1
hw.output %concat : i4

// After (Partially vectorized)
// Chunk 1: [3:1] (vectorized)
%chunk_1 = comb.extract %in from 1 for 3 : (i4) -> i3
// Chunk 0: [0] (scalar logic)
%in_1 = comb.extract %in from 1 : (i4) -> i1
%in_0 = comb.extract %in from 0 : (i4) -> i1
%chunk_0 = comb.xor %in_1, %in_0 : i1
// Re-concat the vectorized chunks
%final = comb.concat %chunk_1, %chunk_0 : i3, i1
hw.output %final : i4

Patterns not transformed
The pass does not modify modules with cross-bit dependencies or non-linear control flows.
For example:

// cross-dependency example (should remain unchanged)
hw.module @cross_dependency(in %in : i2, out out : i2) {
  %0 = comb.extract %in from 0 : (i2) -> i1
  %1 = comb.extract %6 from 1 : (i2) -> i1
  %2 = comb.xor %0, %1 : i1
  %3 = comb.extract %in from 1 : (i2) -> i1
  %4 = comb.extract %6 from 0 : (i2) -> i1
  %5 = comb.xor %3, %4 : i1
  %6 = comb.concat %5, %2 : i1, i1
  hw.output %6 : i2
}

pronesto · 2025-12-12T00:44:12Z

Hi everyone, just a gentle ping on this PR. It has been open for a while, and I wanted to check whether there is anything we can do on our side to help move the review forward. Many thanks!

uenoku · 2025-11-22T01:18:17Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  bit &operator=(const bit &other);
+  bool operator==(const bit &other) const;
+
+  bool left_adjacent(const bit &other);


nit: please use camelBack (as noted in MLIR style guide https://mlir.llvm.org/getting_started/DeveloperGuide/#style-guide)

uenoku · 2025-12-15T08:15:06Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+
+  Block &block = module.getBody().front();
+  auto outputOp = dyn_cast<hw::OutputOp>(block.getTerminator());
+  if (!outputOp)


This if-statement is not necessary as hw::OutputOp is guaranteed by a verifier.

uenoku · 2025-12-15T08:22:23Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+
+    bool containsLLHD = false;
+    module.walk([&](mlir::Operation *op) {
+      if (op->getDialect()->getNamespace() == "llhd") {


Why does this gives up when there is llhd?

uenoku · 2025-12-15T08:31:41Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  } else if (auto andOp = dyn_cast<comb::AndOp>(op)) {
+    Value lhs = andOp.getInputs()[0];
+    Value rhs = andOp.getInputs()[1];
+    if (isa_and_nonnull<hw::ConstantOp>(rhs.getDefiningOp()))
+      return findBitSource(lhs, bitIndex, depth + 1);
+    if (isa_and_nonnull<hw::ConstantOp>(lhs.getDefiningOp()))
+      return findBitSource(rhs, bitIndex, depth + 1);


I'm not sure I'm following the code correctly but these parts seem not correct. Is it necessary to check the value of the constant? Also can't we simply treat and/or/xor as the source op? here?

uenoku · 2025-12-15T08:34:51Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+bool vectorizer::cleanup_dead_ops(Block &block) {
+  bool overallChanged = false;
+  bool changedInIteration = true;
+  while (changedInIteration) {
+    changedInIteration = false;
+    llvm::SmallVector<Operation *, 16> deadOps;
+    for (Operation &op : block) {
+      if (op.use_empty() && !op.hasTrait<mlir::OpTrait::IsTerminator>()) {
+        deadOps.push_back(&op);
+      }
+    }
+    if (!deadOps.empty()) {
+      changedInIteration = true;
+      overallChanged = true;
+      for (Operation *op : deadOps) {
+        op->erase();
+      }
+    }
+  }
+  return overallChanged;
+}


Could you use https://github.com/llvm/llvm-project/blob/db557bee1e2c128e77805deb86c1f364b5c29e70/mlir/lib/Transforms/Utils/RegionUtils.cpp#L495? There are few issues around side-effecting op and O(N^2) fixpoint iterations here so would be nice to simply use a library function.

uenoku · 2025-12-15T08:38:31Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  llvm::DenseSet<mlir::Value> sources;
+  for (const auto &[_, bit] : bits) {
+    if (!sources.contains(bit.source))
+      sources.insert(bit.source);
+    if (sources.size() >= 2)
+      return false;
+  }
+  return true;


super nit: using DenseSet is certainly overkill here, e.g.:

Suggested change

llvm::DenseSet<mlir::Value> sources;

for (const auto &[_, bit] : bits) {

if (!sources.contains(bit.source))

sources.insert(bit.source);

if (sources.size() >= 2)

return false;

}

return true;

mlir::Value source;

for (const auto &[_, bit] : bits) {

if(source && source != bit.source) return false;

source = bit.source;

}

return true;

mafeguimaraes · 2025-12-16T12:56:09Z

Hi @uenoku,

Thank you very much for the review and for pointing out the issues with the previous approach. It was really helpful.

I’ve reworked findBitSource to keep it strictly structural again, and moved all boolean reasoning into a separate helper (isBitConstant). This helper is intentionally limited: it only proves constants through structural traversal and identity propagation (e.g., and(x, 1) and or(x, 0)), and does not attempt general boolean simplification.

The helper is used only to recognize identity masks in and/or, which allows handling the mux-like pattern in test_mux without turning findBitSource into a semantic evaluator.

Please let me know if this direction looks more reasonable to you, or if you’d prefer an even more conservative restriction.

Thanks again for the review!

mafeguimaraes · 2026-01-19T19:34:50Z

Hi @uenoku, hope you’re doing well and had a great holiday season!

I just wanted to gently follow up on this PR. It’s been rebased, all checks are passing, and it addresses the feedback about the isBitConstant helper.

Happy to make any further changes if needed. Thanks a lot!

uenoku · 2026-01-24T21:33:10Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  if (!allBitsHaveSameSource() || bits.empty()) {
+    return nullptr;
+  }


Suggested change

if (!allBitsHaveSameSource() || bits.empty()) {

return nullptr;

}

if (!allBitsHaveSameSource() || bits.empty())

return nullptr;

uenoku · 2026-01-26T07:47:42Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+
+bool BitArray::allBitsHaveSameSource() const {
+  mlir::Value source;
+  for (const auto &[_, bit] : bits) {


Note that iteration order of DenseMap is non-deterministic, is there any place that depends on the order?

uenoku · 2026-01-26T07:48:57Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  return true;
+}
+
+Bit BitArray::getBit(int n) { return bits[n]; }


Is it fine to mutate bits here? If n is not registered yet, it returns nullptr but is it handled in a caller?

uenoku · 2026-01-26T07:51:00Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  IRRewriter rewriter(module.getContext());
+  bool changed = false;
+
+  for (Value oldOutputVal : outputOp->getOperands()) {


Does this only apply transformation for an output value? (though it makes sense as a first step, maybe you might want to consider other operations like hw.instance/seq.compreg etc.

uenoku · 2026-01-26T07:55:43Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+}
+
+bool Bit::operator==(const Bit &other) const {
+  return source == other.source and index == other.index;


nit: for the consistency with other places.

Suggested change

return source == other.source and index == other.index;

return source == other.source && index == other.index;

uenoku · 2026-01-26T08:41:54Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  return false;
+}
+
+bool Vectorizer::canVectorizeStructurally(mlir::Value output) {


The current pattern separates analysis (can* functions) from transformation (apply* functions), which requires analyzing the IR twice and makes it very difficult to understand what's been validated before mutation occurs.

Please consider combining these into try* methods that return LogicalResult:

LogicalResult tryStructuralVectorization(OpBuilder &builder, Value value); LogicalResult tryPartialVectorization(OpBuilder &builder, Value value);

if (succeeded(tryLinearVectorization(oldOutputVal, sourceInput))) continue; if (succeeded(tryReverseVectorization(rewriter, oldOutputVal, sourceInput))) continue; if (succeeded(tryStructuralVectorization(rewriter, oldOutputVal))) continue; if (succeeded(tryPartialVectorization(rewriter, oldOutputVal))) continue;

uenoku · 2026-01-26T08:45:12Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+      while ((i - len) >= 0) {
+        Value nextBitSource = findBitSource(oldOutputVal, i - len);
+        auto nextExtractOp =
+            dyn_cast_or_null<comb::ExtractOp>(nextBitSource.getDefiningOp());


Suggested change

dyn_cast_or_null<comb::ExtractOp>(nextBitSource.getDefiningOp());

nextBitSource.getDefiningOp<comb::ExtractOp>();

uenoku · 2026-01-26T08:46:22Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  cone.insert(val);
+
+  Operation *definingOp = val.getDefiningOp();
+  if (!definingOp || isa<BlockArgument>(val) ||


nit: !definingOp means isa<BlockArgument>(val)

uenoku · 2026-01-26T08:49:38Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+  }
+}
+
+bool Vectorizer::isSafeSharedValue(


Could elaborate why this is safe to share? Could you leave comments? I feel it always return true.

uenoku · 2026-01-26T08:51:42Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+
+  if (auto *op = val.getDefiningOp()) {
+    for (auto operand : op->getOperands()) {
+      if (!isSafeSharedValue(operand, visited))


Also because visited is initialized every time at line 593 I think this function seems to visit entire use-def chain.

uenoku · 2026-01-26T08:53:38Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+struct HWVectorizationPass
+    : public hw::impl::HWVectorizationBase<HWVectorizationPass> {
+
+  void getDependentDialects(mlir::DialectRegistry &registry) const override {


please define this in tablegen. Also SV dialect is not necessary i think.

uenoku · 2026-01-26T09:01:59Z

lib/Dialect/HW/Transforms/HWVectorization.cpp

+    return false;
+
+  if (auto c = dyn_cast<hw::ConstantOp>(defOp)) {
+    if (bitIndex < c.getValue().getBitWidth())


Does this condition ever happen?

uenoku

I apologize for the long delay - I'm still working through understanding what the pass does, and it's taking me some time.

Since this PR is quite large and does non-trivial IR transformations, it might be much easier to review and merge quickly if you could split it into smaller PRs. Something like:

Pass boilerplate and basic infrastructure
BitArray data structure + ExtractOp preprocessing + linear vectorization (applyLinearVectorization)
Reverse and mixed permutation vectorization + ConcatOp handling
Logical op preprocessing (and/or/xor) + structural vectorization
Partial vectorization

Keeping each PR under ~300 LOC would make it much easier to review, provide targeted feedback, and merge incrementally. It would also help me (and other reviewers) understand the design better by seeing it build up piece by piece.

What do you think?

mafeguimaraes requested a review from darthscsi as a code owner November 10, 2025 17:58

mafeguimaraes force-pushed the feature/hw-vectorization-pass branch 3 times, most recently from 59a27df to 7dfbad0 Compare November 10, 2025 19:23

uenoku reviewed Dec 15, 2025

View reviewed changes

mafeguimaraes force-pushed the feature/hw-vectorization-pass branch 3 times, most recently from a756a7e to ae4c6b6 Compare December 16, 2025 11:43

[HW] Add HWVectorization pass

1f3df90

mafeguimaraes force-pushed the feature/hw-vectorization-pass branch from ae4c6b6 to 1f3df90 Compare December 16, 2025 11:46

[HW] Enable HWVectorization build in CMakeLists

80f2389

mafeguimaraes force-pushed the feature/hw-vectorization-pass branch from dcbe132 to 80f2389 Compare December 16, 2025 12:32

Merge branch 'main' into feature/hw-vectorization-pass

baf7f88

Add comments

5d0dd2a

uenoku reviewed Jan 26, 2026

View reviewed changes

	return source == other.source and index == other.index;
	return source == other.source && index == other.index;

	dyn_cast_or_null<comb::ExtractOp>(nextBitSource.getDefiningOp());
	nextBitSource.getDefiningOp<comb::ExtractOp>();

[HW] Add HWVectorization pass #9222

Are you sure you want to change the base?

[HW] Add HWVectorization pass #9222

Conversation

mafeguimaraes commented Nov 10, 2025

1. Linear concatenations (identity):

2. Bit reversal:

3. Structural Patterns (e.g., Vectorized Mux)

4. Partial Vectorization (Chunking):

Uh oh!

pronesto commented Dec 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mafeguimaraes commented Dec 16, 2025

Uh oh!

mafeguimaraes commented Jan 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

uenoku left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants