Skip to content

Onnx GroupNorm Op21 failed in TensorRT 10.7 #4336

@toothache

Description

@toothache

Description

Onnx GroupNormalization is introduced in Opset 18 and has an update in Opset 21. It changes the input shape of scale and bias from (G) to (C) to comply with the original paper and torch implementation.

https://onnx.ai/onnx/operators/text_diff_GroupNormalization_18_21.html

TensorRT can run GroupNormalization Op18, but it fails to run the latest Op version.

Environment

TensorRT Version: 10.7.0

NVIDIA GPU: NVIDIA A100 80GB PCIe

NVIDIA Driver Version: 565.57.01

CUDA Version: 12.7

CUDNN Version: 9.6.0

Operating System: Linux

Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.12-py3

Relevant Files

Model link:

Opset21
Opset18

Steps To Reproduce

The error is raised from INormalizationLayer:

IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)

Full logs:

root@cca92a5458f4:/workspace# trtexec --onnx=model.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v100700] [b23] # trtexec --onnx=model.onnx
[01/24/2025-05:21:16] [I] TF32 is enabled by default. Add --noTF32 flag to further improve accuracy with some performance cost.
[01/24/2025-05:21:16] [I] === Model Options ===
[01/24/2025-05:21:16] [I] Format: ONNX
[01/24/2025-05:21:16] [I] Model: model.onnx
[01/24/2025-05:21:16] [I] Output:
[01/24/2025-05:21:16] [I] === Build Options ===
[01/24/2025-05:21:16] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/24/2025-05:21:16] [I] avgTiming: 8
[01/24/2025-05:21:16] [I] Precision: FP32
[01/24/2025-05:21:16] [I] LayerPrecisions:
[01/24/2025-05:21:16] [I] Layer Device Types:
[01/24/2025-05:21:16] [I] Calibration:
[01/24/2025-05:21:16] [I] Refit: Disabled
[01/24/2025-05:21:16] [I] Strip weights: Disabled
[01/24/2025-05:21:16] [I] Version Compatible: Disabled
[01/24/2025-05:21:16] [I] ONNX Plugin InstanceNorm: Disabled
[01/24/2025-05:21:16] [I] TensorRT runtime: full
[01/24/2025-05:21:16] [I] Lean DLL Path:
[01/24/2025-05:21:16] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/24/2025-05:21:16] [I] Exclude Lean Runtime: Disabled
[01/24/2025-05:21:16] [I] Sparsity: Disabled
[01/24/2025-05:21:16] [I] Safe mode: Disabled
[01/24/2025-05:21:16] [I] Build DLA standalone loadable: Disabled
[01/24/2025-05:21:16] [I] Allow GPU fallback for DLA: Disabled
[01/24/2025-05:21:16] [I] DirectIO mode: Disabled
[01/24/2025-05:21:16] [I] Restricted mode: Disabled
[01/24/2025-05:21:16] [I] Skip inference: Disabled
[01/24/2025-05:21:16] [I] Save engine:
[01/24/2025-05:21:16] [I] Load engine:
[01/24/2025-05:21:16] [I] Profiling verbosity: 0
[01/24/2025-05:21:16] [I] Tactic sources: Using default tactic sources
[01/24/2025-05:21:16] [I] timingCacheMode: local
[01/24/2025-05:21:16] [I] timingCacheFile:
[01/24/2025-05:21:16] [I] Enable Compilation Cache: Enabled
[01/24/2025-05:21:16] [I] Enable Monitor Memory: Disabled
[01/24/2025-05:21:16] [I] errorOnTimingCacheMiss: Disabled
[01/24/2025-05:21:16] [I] Preview Features: Use default preview flags.
[01/24/2025-05:21:16] [I] MaxAuxStreams: -1
[01/24/2025-05:21:16] [I] BuilderOptimizationLevel: -1
[01/24/2025-05:21:16] [I] MaxTactics: -1
[01/24/2025-05:21:16] [I] Calibration Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming: Disabled
[01/24/2025-05:21:16] [I] Runtime Platform: Same As Build
[01/24/2025-05:21:16] [I] Debug Tensors:
[01/24/2025-05:21:16] [I] Input(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Output(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Input build shapes: model
[01/24/2025-05:21:16] [I] Input calibration shapes: model
[01/24/2025-05:21:16] [I] === System Options ===
[01/24/2025-05:21:16] [I] Device: 0
[01/24/2025-05:21:16] [I] DLACore:
[01/24/2025-05:21:16] [I] Plugins:
[01/24/2025-05:21:16] [I] setPluginsToSerialize:
[01/24/2025-05:21:16] [I] dynamicPlugins:
[01/24/2025-05:21:16] [I] ignoreParsedPluginLibs: 0
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Inference Options ===
[01/24/2025-05:21:16] [I] Batch: Explicit
[01/24/2025-05:21:16] [I] Input inference shapes: model
[01/24/2025-05:21:16] [I] Iterations: 10
[01/24/2025-05:21:16] [I] Duration: 3s (+ 200ms warm up)
[01/24/2025-05:21:16] [I] Sleep time: 0ms
[01/24/2025-05:21:16] [I] Idle time: 0ms
[01/24/2025-05:21:16] [I] Inference Streams: 1
[01/24/2025-05:21:16] [I] ExposeDMA: Disabled
[01/24/2025-05:21:16] [I] Data transfers: Enabled
[01/24/2025-05:21:16] [I] Spin-wait: Disabled
[01/24/2025-05:21:16] [I] Multithreading: Disabled
[01/24/2025-05:21:16] [I] CUDA Graph: Disabled
[01/24/2025-05:21:16] [I] Separate profiling: Disabled
[01/24/2025-05:21:16] [I] Time Deserialize: Disabled
[01/24/2025-05:21:16] [I] Time Refit: Disabled
[01/24/2025-05:21:16] [I] NVTX verbosity: 0
[01/24/2025-05:21:16] [I] Persistent Cache Ratio: 0
[01/24/2025-05:21:16] [I] Optimization Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming Budget: 100.000000%
[01/24/2025-05:21:16] [I] Inputs:
[01/24/2025-05:21:16] [I] Debug Tensor Save Destinations:
[01/24/2025-05:21:16] [I] === Reporting Options ===
[01/24/2025-05:21:16] [I] Verbose: Disabled
[01/24/2025-05:21:16] [I] Averages: 10 inferences
[01/24/2025-05:21:16] [I] Percentiles: 90,95,99
[01/24/2025-05:21:16] [I] Dump refittable layers:Disabled
[01/24/2025-05:21:16] [I] Dump output: Disabled
[01/24/2025-05:21:16] [I] Profile: Disabled
[01/24/2025-05:21:16] [I] Export timing to JSON file:
[01/24/2025-05:21:16] [I] Export output to JSON file:
[01/24/2025-05:21:16] [I] Export profile to JSON file:
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Device Information ===
[01/24/2025-05:21:16] [I] Available Devices:
[01/24/2025-05:21:16] [I]   Device 0: "NVIDIA A100 80GB PCIe" UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Selected Device: NVIDIA A100 80GB PCIe
[01/24/2025-05:21:17] [I] Selected Device ID: 0
[01/24/2025-05:21:17] [I] Selected Device UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Compute Capability: 8.0
[01/24/2025-05:21:17] [I] SMs: 108
[01/24/2025-05:21:17] [I] Device Global Memory: 81155 MiB
[01/24/2025-05:21:17] [I] Shared Memory per SM: 164 KiB
[01/24/2025-05:21:17] [I] Memory Bus Width: 5120 bits (ECC enabled)
[01/24/2025-05:21:17] [I] Application Compute Clock Rate: 1.41 GHz
[01/24/2025-05:21:17] [I] Application Memory Clock Rate: 1.512 GHz
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] TensorRT version: 10.7.0
[01/24/2025-05:21:17] [I] Loading standard plugins
[01/24/2025-05:21:17] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 23, GPU 426 (MiB)
[01/24/2025-05:21:19] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2038, GPU +374, now: CPU 2218, GPU 800 (MiB)
[01/24/2025-05:21:19] [I] Start parsing network model.
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] [TRT] Input filename:   model.onnx
[01/24/2025-05:21:19] [I] [TRT] ONNX IR version:  0.0.10
[01/24/2025-05:21:19] [I] [TRT] Opset version:    21
[01/24/2025-05:21:19] [I] [TRT] Producer name:    backend-test
[01/24/2025-05:21:19] [I] [TRT] Producer version:
[01/24/2025-05:21:19] [I] [TRT] Domain:
[01/24/2025-05:21:19] [I] [TRT] Model version:    0
[01/24/2025-05:21:19] [I] [TRT] Doc string:
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] Finished parsing network model. Parse time: 0.00107378
[01/24/2025-05:21:19] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)
[01/24/2025-05:21:19] [E] Engine could not be created from network
[01/24/2025-05:21:19] [E] Building engine failed
[01/24/2025-05:21:19] [E] Failed to create engine from model or file.
[01/24/2025-05:21:19] [E] Engine set up failed

Have you tried the latest release?: Yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes

Metadata

Metadata

Assignees

Labels

Module:ONNXIssues relating to ONNX usage and importinternal-bug-trackedTracked internally, will be fixed in a future release.triagedIssue has been triaged by maintainers

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions