Obvious performance degradation for TRT-fp16 model compared to original Pytorch model

## Description

The performence of TRT-fp32 and OnnxRuntime is equal to the original Pytorch model， while there is obvious performance degradation in TRT-fp16，what is the reason and how to solve it?

![image](https://user-images.githubusercontent.com/35302032/236766447-a6bbf498-b39f-48e0-948a-2a21212bc7ea.png)

## Environment

Pytorch: 2.0.0
CUDA: 11.4
Cudnn: 8.6.0
TensorRT: 8.5-GA
Graphic Cards: Nvidia A100
GPU Driver version: 515.86.01
Operating System: Ubuntu 20.04
Python: 3.10

If there's need to modify some layers or operation of model to improve the performance of TRT-fp16, how to locate these layers or operations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Obvious performance degradation for TRT-fp16 model compared to original Pytorch model #2942

Description

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Obvious performance degradation for TRT-fp16 model compared to original Pytorch model #2942

Description

Description

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions