Skip to content

Commit ca94c96

Browse files
authored
[5620660][ONNX] Remove toposort after quantization (#524)
## What does this PR do? **Type of change:** Bug fix **Overview:** Loading the model with ONNX graphsurgeon after quantization and FP16 conversion results in an ONNX model with FP16 output instead of FP32 even though the Cast_to_fp32 layer was correctly placed in the graph output. This PR fixes that issue. ## Usage ```python $ python -m modelopt.onnx.quantization --onnx_path=$MODEL_NAME.onnx --high_precision_dtype=fp16 ``` ## Testing See bug 5620660. ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: No Signed-off-by: gcunhase <[email protected]>
1 parent fc92e98 commit ca94c96

File tree

1 file changed

+0
-4
lines changed

1 file changed

+0
-4
lines changed

modelopt/onnx/quantization/quantize.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -528,10 +528,6 @@ def quantize(
528528
)
529529
if direct_io_types:
530530
onnx_model = remove_graph_input_q(onnx_model)
531-
# Sort nodes topologically
532-
graph = gs.import_onnx(onnx_model)
533-
graph.toposort().cleanup()
534-
onnx_model = gs.export_onnx(graph)
535531
else:
536532
# Remove redundant cast nodes in the quantized model
537533
# Note. This is called within the qdq_to_dq function as well

0 commit comments

Comments
 (0)