Commit 9409412
authored
Fix AutoQuantize config for MoE (#641)
## What does this PR do?
**Type of change:**
Bug fix
[aq_Qwen3-235B-A22B-Thinking-2507_scores.html](https://github.com/user-attachments/files/23915284/aq_Qwen3-235B-A22B-Thinking-2507_scores.html)
**Overview:**
For Qwen3 MoE, sensitivity should be measured at `layer.x.mlp` not
`layer.x.mlp.experts` (`layer.x.mlp.experts` module forward is never
called. Hence sensitivity was not correctly estimated previously).
After this PR, Qwen3 MoE sensitivity is correctly estimated. (See
attached file)
## Usage
<!-- You can potentially add a usage example below. -->
```python
# Add a code snippet demonstrating how to use this
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
---------
Signed-off-by: realAsma <[email protected]>1 parent 9e280f4 commit 9409412
File tree
4 files changed
+33
-9
lines changed- examples/llm_ptq
- modelopt/torch/quantization
- tests
- _test_utils/torch
- unit/torch/quantization/plugins
4 files changed
+33
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
| 159 | + | |
159 | 160 | | |
160 | 161 | | |
161 | 162 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
755 | 755 | | |
756 | 756 | | |
757 | 757 | | |
758 | | - | |
| 758 | + | |
759 | 759 | | |
760 | 760 | | |
761 | 761 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
61 | 63 | | |
62 | 64 | | |
63 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
64 | 88 | | |
65 | 89 | | |
66 | 90 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
137 | 138 | | |
138 | 139 | | |
139 | 140 | | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
146 | 145 | | |
147 | 146 | | |
148 | 147 | | |
| |||
0 commit comments