Skip to content

keep layernorm to FP32 failure of TensorRT 8.6.11 #4540

@zhengye1995

Description

@zhengye1995

Description

When I used tensorrt8.6.11 to export a standard transformer model, I found that the results were consistent with those of ONNX when running at FP32 precision, and the inference accuracy remained very good, but the FP16 inference accuracy was lost a lot. It should be emphasized that I have used opset version 17 to export onnx, and the engine was generated smoothly.

Through polygraph debugging, I found that layernorm caused the loss of precision, so I tried to force layernorm to FP32 alone, but graph fusion would turn the entire transformer into FP32 precision. I also tried to set some other ops inside the transformer, such as gemm, which would also cause the entire transformer to be converted to FP32 precision inference. I hope that layernorm will remain in fp32 and other operators will remain in fp16, so as to achieve higher accuracy while maintaining efficient inference. How can I achieve this?

Environment

TensorRT Version: 8.6.11

NVIDIA GPU: RTX 2070 and Drive ORIN-X

NVIDIA Driver Version: 530.41.03

CUDA Version: 11.7 and 12.1

CUDNN Version: 9.7.1

Operating System:

Python Version (if applicable): 3.8

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 1.13.1

Baremetal or Container (if so, version):

Metadata

Metadata

Assignees

Labels

Module:AccuracyOutput mismatch between TensorRT and other frameworkstriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions