Skip to content

Inconsistency between the outputs of dino onnx and dino trt engine #4404

@Desperado721

Description

@Desperado721

Hi team,

I'm using trtexec to convert dino onnx to trt engine. however I'm seeing different outputs compared to dino onnx. here is what I found: Would appreciate it if you could share your insights. Thanks!

  • I tested the outputs on 2 images and visualized the results
  • dino torch vs dino onnx: quite similar
    • conf score: almost same
    • bbox: shifted a little bit, but it's okay
  • dino onnx vs dino trtr: totally different
    • conf score: is decreased by one order of magnitude.
    • bbox: totally different

Some context:

I'm using mmdeploy to convert dino torch -> dino onnx -> dino trt,

    • I need to build custom ops (gird_sampler ) for onnxruntime and trt separately to support the conversion. Everything looks good to me and I built these dynamic libraries successfully. They are used to complete the conversion and launch triton server for inference. I attached them too for further debug

I used the following cmd to do the conversion, (I'm assuming dino torch->dino onnx succeeded because the outputs are consistent) so I only post the cmd that I used in the 2nd stage(dino onnx -> dino trt)

/azureuser/TensorRT-10.3.0.26/targets/x86_64-linux-gnu/bin/trtexec --onnx=mmdeploy_model/dino_trt_aman_fp16_0304_torchscript/end2end.onnx  --saveEngine=mmdeploy_model/dino_trt_aman_fp16_0304_torchscript/end2end_fp16.plan --minShapes=input:1x3x240x240 --optShapes=input:2x3x240x240 --maxShapes=input:4x3x240x240 --useCudaGraph --plugins=mmdeploy/mmdeploy/lib/libmmdeploy_tensorrt_ops.so --verbose

dependencies:
Github repo: https://github.com/open-mmlab/mmdeploy/tree/main

My VM:
Cuda 12.6
Standard NC4as T4 v3

Dependencies:
Torch 2.1.0
Onnx 1.19.0
TensorRT 10.3.0
Torchscript https://download.pytorch.org/libtorch/cu121
Triton 24.08

full logs

trtexec.log

my PR for mmdeploy to build custom ops
Desperado721/mmdeploy#1

the virualization of dino onnx, dino torch, dino trt

Image
Image
Image

the dynamic libraries that I built

dynamic_libraries.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:AccuracyOutput mismatch between TensorRT and other frameworkstriagedIssue has been triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions