Skip to content

Conversation

@SushmitaThakallapalli1980
Copy link

@SushmitaThakallapalli1980 SushmitaThakallapalli1980 commented Aug 20, 2025

REMOVE_BINARY summary:

This RemoveBinary class is a pattern-matching and rewrite utility that finds and removes binary ops (Add, Sub, Mul, Div) in quantized ONNX graphs when one input is effectively a constant. It folds the constant into the quantization parameters instead of keeping an explicit binary op in the graph.

This class rewrites patterns like:

 Dequant(x) ----\
                             Binary(Add/Sub/Mul/Div) → Quant → Dequant
Dequant(const)-/

into something like

  Dequant'(x, new_scale/zp) → Quant → Dequant
  
  Where the constant is folded into scale or zero-point. 
  
  Then, a check is performed to see if we can possibly remove the Quant → Dequant chain.

@tvivies-amd tvivies-amd requested a review from xiaohanAMD August 20, 2025 11:52
@tvivies-amd
Copy link

I added @xiaohanAMD as reviewer too since he wrote the initial pattern and some improvements

Copy link

@tvivies-amd tvivies-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small comments, I like the usage of if constexpr and the state structure.

}
state.kValue = kValueOpt.value();

// Debug - To be removed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally you would not remove the debug ouptut that you used during the implementation, I would prefer if you could log them instead. @ehsan-toosi do you know if there is any logging capabilities in this repo ?

Comment on lines +225 to +226
if (failed(match_qdq(state, dqOp1, dqOp2)))
return failure();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add bracket here too, in order to keep the coding style consistent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 228 to 231
// Debug - To be removed
llvm::outs() << "B. SUCCESS\n";
llvm::outs() << "kValue = " << state.kValue << "\n";
printOnnxNodeName(binaryOp, "[RemoveBinary] matched");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can put this in the match_qdq method and return directly the output of the call to match_qdq

Comment on lines 243 to 252
for (Value res : op->getResults()) {
for (Operation *user : res.getUsers()) {
if (auto q = dyn_cast<ONNXQuantizeLinearOp>(user)) {
quantOutputOp = q;
break;
}
}
if (quantOutputOp)
break;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the binary op is a qdq operation, I would expect that the output of the binOp only has one user that is the QuantizeLinear operation. This is what the python implementation is expecting and it is fine if we have the same check here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

.add<FoldBinaryThroughQDQ<ONNXDivOp>, FoldBinaryThroughQDQ<ONNXSubOp>,
FoldBinaryThroughQDQ<ONNXMulOp>, FoldBinaryThroughQDQ<ONNXAddOp>>(
&getContext());
if (failed(applyPatternsAndFoldGreedily(function, std::move(patterns))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to use the greedy approach here, since we are not creating any new ops that would needs to be visited by the pass and same for the modified ops they do not need to be matched again. See documentation of the walker configuration: https://mlir.llvm.org/docs/PatternRewriter/#walk-pattern-rewrite-driver
What do you think ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried replacing applyPatternsAndFoldGreedily with walkAndApplyPatterns. But the code is crashing for some test cases. Not sure why. Therefore, restored greedy driver again.

@xiaohanAMD
Copy link

xiaohanAMD commented Aug 22, 2025

this pattern is very complicated when there is fork in the match chain. Please add test case to fix this scenario, here Q1 has a fork, we expect not to fold into DQ1, but in Q2:

        const ---\
                 |
                 v
Q1 ---> DQ1 -> BinOp -> Q2 -> DQ2
    |
    \-> something else

@SushmitaThakallapalli1980
Copy link
Author

this pattern is very complicated when there is fork in the match chain. Please add test case to fix this scenario, here Q1 has a fork, we expect not to fold into DQ1, but in Q2:

        const ---\
                 |
                 v
Q1 ---> DQ1 -> BinOp -> Q2 -> DQ2
    |
    \-> something else

@xiaohanAMD
In the following https://jira.xilinx.com/browse/AIESW-8092, as per the initial scoping, folding is expected only on DQ1...Is this not the case?

@xiaohanAMD
Copy link

this pattern is very complicated when there is fork in the match chain. Please add test case to fix this scenario, here Q1 has a fork, we expect not to fold into DQ1, but in Q2:

        const ---\
                 |
                 v
Q1 ---> DQ1 -> BinOp -> Q2 -> DQ2
    |
    \-> something else

@xiaohanAMD In the following https://jira.xilinx.com/browse/AIESW-8092, as per the initial scoping, folding is expected only on DQ1...Is this not the case?

I didn't see that, sure, let's implement the basic in this PR. More complicate case we can do later.

@jorickert jorickert closed this Aug 25, 2025
@jorickert jorickert reopened this Aug 25, 2025
@SushmitaThakallapalli1980 SushmitaThakallapalli1980 marked this pull request as ready for review August 25, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants