Skip to content

ONNX Runtime GPU is ~2x slower than PyTorch for cellpose model inference #3

@Endeavor-0808

Description

@Endeavor-0808

Problem

Hello, thank you for your work on this project.

I encountered a performance issue while deploying the Transformer model and would like to ask whether this is a known limitation or whether there is a recommended deployment approach.

I tested inference performance with a fixed input shape of (4, 3, 256, 256) and observed that ONNX Runtime GPU is significantly slower than PyTorch:

  • PyTorch: 0.17 ~ 0.19 s / batch
  • Python ONNX Runtime GPU: 0.35 ~ 0.40 s / batch
  • C++ ONNX Runtime GPU: 0.35 ~ 0.40 s / batch

In this case, ONNX Runtime GPU is about 2x slower than PyTorch.

Also, Python ORT and C++ ORT show very similar latency, so this does not appear to be caused by Python wrapper overhead.


What I have checked

I have already tried the following:

  • Removed the style output
    • This made almost no difference.
  • Exported the model with static batch and static input shape
    • This also made almost no difference.
  • Verified through profiling that the main computation is running on CUDAExecutionProvider.
  • Checked the main hotspots in profiling:
    • the first Conv
    • later Gemm/MatMul
    • some Reshape/Transpose ops
  • When setting cudnn_conv_algo_search=DEFAULT, the log shows that Conv runs in Fallback mode, and performance becomes even worse.

Questions

I would like to ask:

  1. Have you tested this Transformer model on ONNX Runtime GPU and compared its performance against PyTorch?
  2. Is there a recommended ONNX export method or deployment configuration for this model?
  3. In your experience, is this model better suited for TensorRT than for ONNX Runtime CUDAExecutionProvider?

Additional information

If needed, I can also provide:

  • ONNX export code
  • ONNX Runtime profiling results
  • a minimal reproducible script

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions