Skip to content

trtexec can't compile ONNX model with !n->candidateRequirements.empty() failed. No supported formats for Unsqueeze #3688

@fxmarty

Description

@fxmarty

Description

As reported in huggingface/optimum#1735, a valid ONNX model fails with the latest TRT release:

[02/29/2024-10:22:33] [V] [TRT] After concat removal: 18 layers
[02/29/2024-10:22:33] [V] [TRT] Trying to split Reshape and strided tensor
[02/29/2024-10:22:33] [I] [TRT] Graph optimization time: 1.62121 seconds.
[02/29/2024-10:22:33] [V] [TRT] Building graph using backend strategy 2
[02/29/2024-10:22:33] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[02/29/2024-10:22:33] [V] [TRT] Constructing optimization profile number 0 [1/1].
[02/29/2024-10:22:33] [V] [TRT] Applying generic optimizations to the graph for inference.
[02/29/2024-10:22:33] [E] Error[2]: Assertion !n->candidateRequirements.empty() failed. No supported formats for /model/layers.0/self_attn/rotary_emb/Unsqueeze_1
[02/29/2024-10:22:33] [E] Error[2]: [optimizer.cpp::getFormatRequirements::3154] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. No supported formats for /model/layers.0/self_attn/rotary_emb/Unsqueeze_1)
[02/29/2024-10:22:33] [E] Engine could not be created from network
[02/29/2024-10:22:33] [E] Building engine failed
[02/29/2024-10:22:33] [E] Failed to create engine from model or file.
[02/29/2024-10:22:33] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=model_quantized.onnx --saveEngine=model.plan --minShapes=input_ids:1x400,attention_mask:1x400,position_ids:1x400 --optShapes=input_ids:16x400,attention_mask:16x400,position_ids:16x400 --maxShapes=input_ids:32x400,attention_mask:32x400,position_ids:32x400 --verbose --int8

I tried both with int32 & int64 input dtype and it does not seem to matter.

Environment

TensorRT Version: nvcr.io/nvidia/tensorrt:24.01-py3

NVIDIA GPU: A100-80GB

NVIDIA Driver Version: CUDA_DRIVER_VERSION=545.23.08

CUDA Version: CUDA_VERSION=12.3.2.001

CUDNN Version: CUDNN_VERSION=8.9.7.29+cuda12.2

Relevant Files

It is 125 MB larger than 25 MB, so uploading here: https://huggingface.co/fxmarty/tiny-gemma-onnx-quantized-trt

Please use git clone https://huggingface.co/fxmarty/tiny-gemma-onnx-quantized-trt

Steps To Reproduce

Download the above model and run:

Commands or scripts: trtexec --onnx=model_quantized.onnx --saveEngine=model.plan --minShapes=input_ids:1x400,attention_mask:1x400,position_ids:1x400 --optShapes=input_ids:16x400,attention_mask:16x400,position_ids:16x400 --maxShapes=input_ids:32x400,attention_mask:32x400,position_ids:32x400 --verbose --int8

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes it works.

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions