Skip to content

Obvious performance degradation for TRT-fp16 model compared to original Pytorch model #2942

@KingofAsianPopJC

Description

@KingofAsianPopJC

Description

The performence of TRT-fp32 and OnnxRuntime is equal to the original Pytorch model, while there is obvious performance degradation in TRT-fp16,what is the reason and how to solve it?

image

Environment

Pytorch: 2.0.0
CUDA: 11.4
Cudnn: 8.6.0
TensorRT: 8.5-GA
Graphic Cards: Nvidia A100
GPU Driver version: 515.86.01
Operating System: Ubuntu 20.04
Python: 3.10

If there's need to modify some layers or operation of model to improve the performance of TRT-fp16, how to locate these layers or operations?

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions