Commit fcbdc31
authored
[NVBUG: 5373030] Disable the weight adjustment for int32 bias from onnxruntime (NVIDIA#510)
## What does this PR do?
**Type of change:**
Bug Fix
**Overview:**
- Disable the weight adjustment for int32 bias in onnxruntime by default
## Usage
```python
python -m modelopt.onnx.quantization --onnx_path=code031_gemm_batch.onnx --simplify --calibration_eps trt --quantize_mode fp8 --disable_mha_qdq --high_precision_dtype fp16
```
## Testing
Able to quantize the code031_gemm_batch.onnx model
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: Yes
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No
---------
Signed-off-by: ajrasane <[email protected]>1 parent 69c258f commit fcbdc31
3 files changed
+9
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
272 | 272 | | |
273 | 273 | | |
274 | 274 | | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
275 | 279 | | |
276 | 280 | | |
277 | 281 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
240 | 244 | | |
241 | 245 | | |
242 | 246 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1600 | 1600 | | |
1601 | 1601 | | |
1602 | 1602 | | |
| 1603 | + | |
1603 | 1604 | | |
1604 | 1605 | | |
1605 | 1606 | | |
| |||
0 commit comments