Skip to content

Non-Maximal-Suppression (NMS) Layers slow on TensorRT 10.0-10.6 #4248

@darrin-willis

Description

@darrin-willis

Description

NMS Layers are much slower on TensorRT than on PyTorch (44% of the performance) and I'm looking for any possible workaround. This seems to be acknowledged as a known issue in the TensorRT release notes here:

A performance regression is expected for TensorRT 10.x with respect to TensorRT 8.6 for networks with operations that involve data-dependent shapes, such as non-max suppression or non-zero operations

Is there any possible workaround or a fix planned in a specific future version? I am specifically using these layers inside a FasterRCNN network (as implemented in torchvision here). I observe this network to be much slower when running either with a single image or 4 images:

  • Single image inference latency: 7.8ms on PyTorch, 13.3ms on TensorRT
  • 4 image inference latency: 22.8ms on PyTorch, 53.5ms on TensorRT

When I run this network with per-layer profiling, I see that the NonMaxSuppression layers account for 75%+ of the overall inference time. I have verified this on TensorRT 10.0 and 10.6. I have tested using ONNX opset 11 and opset 17.

Environment

TensorRT Version: 10.0, 10.6

NVIDIA GPU: GeForce RTX 4090

NVIDIA Driver Version: 550.54.15

CUDA Version: 12.4

CUDNN Version: unsure

Operating System:

Python Version (if applicable): 3.9

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 2.2

Baremetal or Container (if so, version):

Relevant Files

Model link: https://pytorch.org/vision/main/models/faster_rcnn.html

Steps To Reproduce

  1. Export FasterRCNN to ONNX
  2. Pass ONNX into trtexec
  3. Compare trtexec output to PyTorch equivalent benchmark

Commands or scripts:

Have you tried the latest release?: Yes I have tried TensorRT 10.6 and 10.0

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes it runs on onnxruntime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:PerformanceGeneral performance issuestriagedIssue has been triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions