-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
I tried to convert a DAT model into TensorRT format at fp16 precision. But when I tried to perform inference on it, it only produces nan.
Environment
TensorRT Version: 10.0.1.6
NVIDIA GPU: RTX 3060
NVIDIA Driver Version: 551.61
CUDA Version: 12.3
CUDNN Version: 8.9.7.29
Operating System
Windows 11
Relevant Files
Model Link: 4x-Nomos8kDAT (.onnx format)
Steps To Reproduce
- Convert the model into
fp16engine
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-fp16.trt --shapes=input:1x3x128x128 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16-
Perform Inference
- Code Used: https://github.com/Haoming02/TensorRT-Cpp/tree/bf16
- Replace every
__nv_bfloat16withhalf; andcuda_bf16.hwithcuda_fp16.h
-
See only a pure black output
- When adding debug log to the
outputData, it simply printsnan
- When adding debug log to the
Misc
Interestingly, if I convert the model into bf16 precision with the following:
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-bf16.trt --shapes=input:1x3x128x128 --inputIOFormats=bf16:chw --outputIOFormats=bf16:chw --bf16And use the above code to perform inference, the output works correctly. So only fp16 causes nan issues...
- The model size for
fp32is ~120 MB, forfp16is ~70 MB; forbf16is ~100 MB - The inference speed is similar between
fp32andbf16; but almost twice as fast forfp16
Previously, I also tried using TensorRT 8.6 to convert the model. When specifying the fp16 flag, it would print out some warnings about inaccuracy. However, these warnings were not present when converting the model using TensorRT 10.0.