[5620660][ONNX] Remove toposort after quantization (#524)

gcunhase · web-flow · commit ca94c96c5025 · 2025-11-07T16:44:02.000Z
## What does this PR do? **Type of change:** Bug fix **Overview:** Loading the model with ONNX graphsurgeon after quantization and FP16 conversion results in an ONNX model with FP16 output instead of FP32 even though the Cast_to_fp32 layer was correctly placed in the graph output. This PR fixes that issue. ## Usage ```python $ python -m modelopt.onnx.quantization --onnx_path=$MODEL_NAME.onnx --high_precision_dtype=fp16 ``` ## Testing See bug 5620660. ## Before your PR is "*Ready for review*"  - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes - **Did you write any new necessary tests?**: No - **Did you add or update any necessary documentation?**: No - **Did you update [Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**: No Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
diff --git a/modelopt/onnx/quantization/quantize.py b/modelopt/onnx/quantization/quantize.py
@@ -528,10 +528,6 @@ def quantize(
                 )
             if direct_io_types:
                 onnx_model = remove_graph_input_q(onnx_model)
-            # Sort nodes topologically
-            graph = gs.import_onnx(onnx_model)
-            graph.toposort().cleanup()
-            onnx_model = gs.export_onnx(graph)
         else:
             # Remove redundant cast nodes in the quantized model
             # Note. This is called within the qdq_to_dq function as well