-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
I tried to convert a mixed precision onnx model to mixed precision TensorRT engine.
In my mixed precision onnx model, I have kept some ops (ReduceSum, Pow) to be fp32, some back-to-back Cast Op to be fp32(For example, ReduceSum(fp32)->output(fp32)->Cast(fp32)->Pow(fp32))
In my build_engine.py, I set the following config, set obey precision and set the corresponding layers to be fp32
`
config.set_flag(trt.BuilderFlag.FP16)
config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
for i in range(network.num_layers):
op_name = network.get_layer(i).name.split('/')[-1]
if 'Pow' == op_name or 'ReduceSum' == op_name or 'Pow_1' == op_name:
print(network.get_layer(i).name)
# input('test')
network.get_layer(i).precision = trt.DataType.FLOAT
network.get_layer(i).set_output_type(0, trt.DataType.FLOAT)
if 'Pow_1_output_cast0' == op_name or 'ReduceSum_input_cast1' == op_name or 'Pow_output_cast0' == op_name\
or 'Pow_1_input_cast0' == op_name or 'ReduceSum_input_cast0' == op_name or 'Pow_input_cast0' == op_name:
print(network.get_layer(i).name)
network.get_layer(i).precision = trt.DataType.FLOAT
`
The result of the tensor engine is quite different from the onnx model.
Any idea that I can solve this?
Environment
TensorRT Version:
NVIDIA GPU: A100
NVIDIA Driver Version: 12.5
CUDA Version:12.5
CUDNN Version: 12.5
Operating System: Linux
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):