-
Notifications
You must be signed in to change notification settings - Fork 742
Quantization folding pass #7240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7240
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 2a03d6f with merge base 3f7eb3b ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
backends/arm/operators/op_max.py
Outdated
| ) | ||
|
|
||
| output.shape = tosa_shape(output.shape, output.dim_order) | ||
| min_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| min_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32) | |
| max_output = tosa_graph.addIntermediate(output.shape, ts.DType.INT32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yepp!
backends/arm/operators/op_max.py
Outdated
| x_scale = input_qparams[0].scale | ||
| x_zp = input_qparams[0].zp | ||
|
|
||
| y_scale = input_qparams[1].scale | ||
| y_zp = input_qparams[1].zp | ||
|
|
||
| assert ( | ||
| x_zp == y_zp | ||
| ), "Different zp for inputs, MAX should be quantized with shared quantization!" | ||
| assert ( | ||
| x_scale == y_scale | ||
| ), "Different scale for input, MAX should be quantized with shared quantization!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor this as a util to assert shared qconfigs across inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes will fix it up.
|
|
||
| class SimpleQuantizeModel(torch.nn.Module): | ||
| def forward(self, x): | ||
| return x + x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may be make it slightly more complicated with >1 input tensors and >1 add-nodes? may be like max((x + x), (y + y))
Also and also chain of nodes i.e. q0->dq0->op1->q2->dq2->op2->q3-dq3 => q0->op1*->op2*->dq3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
backends/arm/tosa_quant_utils.py
Outdated
| dim_order = tensor.dim_order | ||
| tensor.shape = [tensor.shape[i] for i in dim_order] | ||
|
|
||
| qargs = list(cast(dict[int, QuantArgs], node.meta["input_qparams"]).values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert input_qparams in node.meta
backends/arm/tosa_quant_utils.py
Outdated
| """ | ||
| assert len(node.meta["output_qparams"]) == 1 | ||
|
|
||
| qargs_out = cast(dict[int, QuantArgs], node.meta["output_qparams"])[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
backends/arm/tosa_quant_utils.py
Outdated
| return rescaled_nodes, min_scale | ||
|
|
||
|
|
||
| def insert_rescale_node_back_to_int8( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def insert_rescale_node_back_to_int8( | |
| def insert_rescale_node_to_int8( |
Reuse the logic from the node visiting quantization handling, but replace the quantization parameter fetching from the node meta values. Signed-off-by: Per Åstrand <[email protected]> Change-Id: I9a7bbf6384284e60118756ec5661f6b11847aba7
Fold DQ/Q nodes into the target operators specified to the pass. Signed-off-by: Per Åstrand <[email protected]> Change-Id: I8a09dc0b887dd5f3915ca157f578ecf51772a1a2
Uses the fold DQ/Q pass to encapsulate the quantization information within the node. Signed-off-by: Per Åstrand <[email protected]> Change-Id: I3adbab7e2a23a0208a03bbc423b38c15221a4959
Signed-off-by: Per Åstrand <[email protected]> Change-Id: I9230209ed3d6cc0b5ec7a35512248648bb8380ee
Signed-off-by: Per Åstrand <[email protected]> Change-Id: I6154e13a5a6b75549862709d632ee6dd5c8b0e7f
Adds a helper function to retrieve QuantArgs from node.meta and cleanup the handling a bit by introducing the __eq__ operator for QuantArgs. Signed-off-by: Per Åstrand <[email protected]> Change-Id: I519a9a286a36a278f40ffb6c679192a54d9f940d
Signed-off-by: Per Åstrand <[email protected]> Change-Id: I2d133f4347d9999c770e5337162c222368c212f2
4a46eec to
2a03d6f
Compare
|
pull / unittest / macos / macos-job (pull_request) failing seems to be unrelated (test_flamingo_vision_encoder) |
Summary
Adds a folding pass to fold in q and dq nodes.
Test plan
Added test for the new pass