keep layernorm to FP32 failure of TensorRT 8.6.11

## Description

When I used tensorrt8.6.11 to export a standard transformer model, I found that the results were consistent with those of ONNX when running at FP32 precision, and the inference accuracy remained very good, but the FP16 inference accuracy was lost a lot. It should be emphasized that I have used opset version 17 to export onnx, and the engine was generated smoothly. 

Through polygraph debugging, I found that layernorm caused the loss of precision, so I tried to force layernorm to FP32 alone, but graph fusion would turn the entire transformer into FP32 precision. I also tried to set some other ops inside the transformer, such as gemm, which would also cause the entire transformer to be converted to FP32 precision inference. I hope that layernorm will remain in fp32 and other operators will remain in fp16, so as to achieve higher accuracy while maintaining efficient inference. How can I achieve this?


## Environment

**TensorRT Version**: 8.6.11

**NVIDIA GPU**: RTX 2070 and Drive ORIN-X

**NVIDIA Driver Version**:  530.41.03

**CUDA Version**: 11.7 and 12.1

**CUDNN Version**: 9.7.1


Operating System:

Python Version (if applicable): 3.8

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 1.13.1

Baremetal or Container (if so, version):



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keep layernorm to FP32 failure of TensorRT 8.6.11 #4540

Description

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

keep layernorm to FP32 failure of TensorRT 8.6.11 #4540

Description

Description

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions