Commit c6c9905
authored
[OMNIML-2244] Create the nvfp4 quant exporter (NVIDIA#636)
## What does this PR do?
**Type of change:**
New feature
**Overview:**
- Implemented the NVFP4QuantExporter
- Deprecated fp4qdq_to_2dq
- Updated tests
## Usage
```python
python torch_quant_to_onnx.py --quantize_mode=nvfp4 \
--onnx_save_path=vit_base_patch16_224.nvfp4.onnx \
--calibration_data_size 64 \
--batch_size 128
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
```
python evaluate.py --onnx_path=vit_base_patch16_224.nvfp4.onnx \
--model_name=vit_base_patch16_224 \
--results_path=./results.txt \
--batch_size 128
```
Results:
```
The top1 accuracy of the model is 84.39%
The top5 accuracy of the model is 97.312%
Inference latency of the model is 7.22412 ms
```
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: No
- Deprecated fp4qdq_to_2dq
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
Signed-off-by: ajrasane <[email protected]>1 parent 097037d commit c6c9905
File tree
7 files changed
+394
-287
lines changed- examples
- diffusers/quantization/onnx_utils
- onnx_ptq
- modelopt
- onnx
- export
- quantization
- torch/_deploy/utils
- tests
- gpu/torch/quantization
- unit/onnx
7 files changed
+394
-287
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
547 | 547 | | |
548 | 548 | | |
549 | 549 | | |
550 | | - | |
| 550 | + | |
551 | 551 | | |
552 | 552 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | 41 | | |
43 | 42 | | |
44 | 43 | | |
| |||
275 | 274 | | |
276 | 275 | | |
277 | 276 | | |
278 | | - | |
| 277 | + | |
279 | 278 | | |
280 | 279 | | |
281 | 280 | | |
| |||
0 commit comments