-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
NMS Layers are much slower on TensorRT than on PyTorch (44% of the performance) and I'm looking for any possible workaround. This seems to be acknowledged as a known issue in the TensorRT release notes here:
A performance regression is expected for TensorRT 10.x with respect to TensorRT 8.6 for networks with operations that involve data-dependent shapes, such as non-max suppression or non-zero operations
Is there any possible workaround or a fix planned in a specific future version? I am specifically using these layers inside a FasterRCNN network (as implemented in torchvision here). I observe this network to be much slower when running either with a single image or 4 images:
- Single image inference latency: 7.8ms on PyTorch, 13.3ms on TensorRT
- 4 image inference latency: 22.8ms on PyTorch, 53.5ms on TensorRT
When I run this network with per-layer profiling, I see that the NonMaxSuppression layers account for 75%+ of the overall inference time. I have verified this on TensorRT 10.0 and 10.6. I have tested using ONNX opset 11 and opset 17.
Environment
TensorRT Version: 10.0, 10.6
NVIDIA GPU: GeForce RTX 4090
NVIDIA Driver Version: 550.54.15
CUDA Version: 12.4
CUDNN Version: unsure
Operating System:
Python Version (if applicable): 3.9
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.2
Baremetal or Container (if so, version):
Relevant Files
Model link: https://pytorch.org/vision/main/models/faster_rcnn.html
Steps To Reproduce
- Export FasterRCNN to ONNX
- Pass ONNX into
trtexec - Compare
trtexecoutput to PyTorch equivalent benchmark
Commands or scripts:
Have you tried the latest release?: Yes I have tried TensorRT 10.6 and 10.0
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes it runs on onnxruntime.