-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
I have an onnx model (a t5 encoder that I exported from pytorch) that I wish to convert to trt. This works great, but when I try to convert the model to fp16 the model's accuracy drops and it produces nothing useful. I've tried to convert the onnx model to fp16 before converting to trt and the fp16 onnx model's accuracy is good, while the conversion to trt once again hurts it badly.
Environment
TensorRT Version: 8.5.3.1
NVIDIA GPU: NVIDIA RTX A4000
NVIDIA Driver Version: 515.43.04
CUDA Version: 11.7
CUDNN Version: 8.5.0
Operating System: Ubuntu 22.04
Python Version (if applicable): 3.10
Tensorflow Version (if applicable): N/A
PyTorch Version (if applicable): 2.0.0
Baremetal or Container (if so, version): Baremetal
Relevant Files
The fp32 and fp16 onnx models can be downloaded from this link: https://drive.google.com/drive/folders/1zeAW2oPP-2VwnK-SKcRVVqed30BZLMKk?usp=sharing
Steps To Reproduce
This can be reproduced using polygraphy (after making tensorrt use np.bool_ instead of np.bool):
polygraphy run t5_fp32_encoder.onnx --onnxrt --trt
polygraphy run t5_fp32_encoder.onnx --onnxrt --trt --fp16
polygraphy run t5_fp16_encoder.onnx --onnxrt --trt --fp16The first line converts the fp32 model and works, the second and third lines convert the fp32 model to trtfp16 or the fp16 model to trt fp16 and both fail.
EDIT: just noticed that the entire output for the fp16 trt model is zeros (as can be seen by the following line in the polygraphy output:
...
[I] trt-runner-N0-05/01/23-14:36:19: encoder_last_hidden_state | Stats: mean=2.2204e-16, std-dev=0, var=0, median=2.2204e-16, min=2.2204e-16 at (0, 0, 0), max=2.2204e-16 at (0, 0, 0), avg-magnitude=2.2204e-16
...