Skip to content

[Help Needed] Convert ONNX into fp16 Engine #3921

@Haoming02

Description

@Haoming02

Description

I tried to convert a DAT model into TensorRT format at fp16 precision. But when I tried to perform inference on it, it only produces nan.

Environment

TensorRT Version: 10.0.1.6

NVIDIA GPU: RTX 3060

NVIDIA Driver Version: 551.61

CUDA Version: 12.3

CUDNN Version: 8.9.7.29

Operating System

Windows 11

Relevant Files

Model Link: 4x-Nomos8kDAT (.onnx format)

Steps To Reproduce

  1. Convert the model into fp16 engine
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-fp16.trt --shapes=input:1x3x128x128 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
  1. Perform Inference

  2. See only a pure black output

    • When adding debug log to the outputData, it simply prints nan

Misc

Interestingly, if I convert the model into bf16 precision with the following:

trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-bf16.trt --shapes=input:1x3x128x128 --inputIOFormats=bf16:chw --outputIOFormats=bf16:chw --bf16

And use the above code to perform inference, the output works correctly. So only fp16 causes nan issues...

  • The model size for fp32 is ~120 MB, for fp16 is ~70 MB; for bf16 is ~100 MB
  • The inference speed is similar between fp32 and bf16; but almost twice as fast for fp16

Previously, I also tried using TensorRT 8.6 to convert the model. When specifying the fp16 flag, it would print out some warnings about inaccuracy. However, these warnings were not present when converting the model using TensorRT 10.0.

Metadata

Metadata

Assignees

Labels

InvestigatingIssue is under investigation by TensorRT devsModule:Engine BuildIssues with building TensorRT enginestriagedIssue has been triaged by maintainerswaiting for feedbackRequires more information from author of item to make progress on the issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions