-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
Onnx GroupNormalization is introduced in Opset 18 and has an update in Opset 21. It changes the input shape of scale and bias from (G) to (C) to comply with the original paper and torch implementation.
https://onnx.ai/onnx/operators/text_diff_GroupNormalization_18_21.html
TensorRT can run GroupNormalization Op18, but it fails to run the latest Op version.
Environment
TensorRT Version: 10.7.0
NVIDIA GPU: NVIDIA A100 80GB PCIe
NVIDIA Driver Version: 565.57.01
CUDA Version: 12.7
CUDNN Version: 9.6.0
Operating System: Linux
Baremetal or Container (if so, version): nvcr.io/nvidia/tensorrt:24.12-py3
Relevant Files
Model link:
Steps To Reproduce
The error is raised from INormalizationLayer:
IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)
Full logs:
root@cca92a5458f4:/workspace# trtexec --onnx=model.onnx
&&&& RUNNING TensorRT.trtexec [TensorRT v100700] [b23] # trtexec --onnx=model.onnx
[01/24/2025-05:21:16] [I] TF32 is enabled by default. Add --noTF32 flag to further improve accuracy with some performance cost.
[01/24/2025-05:21:16] [I] === Model Options ===
[01/24/2025-05:21:16] [I] Format: ONNX
[01/24/2025-05:21:16] [I] Model: model.onnx
[01/24/2025-05:21:16] [I] Output:
[01/24/2025-05:21:16] [I] === Build Options ===
[01/24/2025-05:21:16] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[01/24/2025-05:21:16] [I] avgTiming: 8
[01/24/2025-05:21:16] [I] Precision: FP32
[01/24/2025-05:21:16] [I] LayerPrecisions:
[01/24/2025-05:21:16] [I] Layer Device Types:
[01/24/2025-05:21:16] [I] Calibration:
[01/24/2025-05:21:16] [I] Refit: Disabled
[01/24/2025-05:21:16] [I] Strip weights: Disabled
[01/24/2025-05:21:16] [I] Version Compatible: Disabled
[01/24/2025-05:21:16] [I] ONNX Plugin InstanceNorm: Disabled
[01/24/2025-05:21:16] [I] TensorRT runtime: full
[01/24/2025-05:21:16] [I] Lean DLL Path:
[01/24/2025-05:21:16] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[01/24/2025-05:21:16] [I] Exclude Lean Runtime: Disabled
[01/24/2025-05:21:16] [I] Sparsity: Disabled
[01/24/2025-05:21:16] [I] Safe mode: Disabled
[01/24/2025-05:21:16] [I] Build DLA standalone loadable: Disabled
[01/24/2025-05:21:16] [I] Allow GPU fallback for DLA: Disabled
[01/24/2025-05:21:16] [I] DirectIO mode: Disabled
[01/24/2025-05:21:16] [I] Restricted mode: Disabled
[01/24/2025-05:21:16] [I] Skip inference: Disabled
[01/24/2025-05:21:16] [I] Save engine:
[01/24/2025-05:21:16] [I] Load engine:
[01/24/2025-05:21:16] [I] Profiling verbosity: 0
[01/24/2025-05:21:16] [I] Tactic sources: Using default tactic sources
[01/24/2025-05:21:16] [I] timingCacheMode: local
[01/24/2025-05:21:16] [I] timingCacheFile:
[01/24/2025-05:21:16] [I] Enable Compilation Cache: Enabled
[01/24/2025-05:21:16] [I] Enable Monitor Memory: Disabled
[01/24/2025-05:21:16] [I] errorOnTimingCacheMiss: Disabled
[01/24/2025-05:21:16] [I] Preview Features: Use default preview flags.
[01/24/2025-05:21:16] [I] MaxAuxStreams: -1
[01/24/2025-05:21:16] [I] BuilderOptimizationLevel: -1
[01/24/2025-05:21:16] [I] MaxTactics: -1
[01/24/2025-05:21:16] [I] Calibration Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming: Disabled
[01/24/2025-05:21:16] [I] Runtime Platform: Same As Build
[01/24/2025-05:21:16] [I] Debug Tensors:
[01/24/2025-05:21:16] [I] Input(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Output(s)s format: fp32:CHW
[01/24/2025-05:21:16] [I] Input build shapes: model
[01/24/2025-05:21:16] [I] Input calibration shapes: model
[01/24/2025-05:21:16] [I] === System Options ===
[01/24/2025-05:21:16] [I] Device: 0
[01/24/2025-05:21:16] [I] DLACore:
[01/24/2025-05:21:16] [I] Plugins:
[01/24/2025-05:21:16] [I] setPluginsToSerialize:
[01/24/2025-05:21:16] [I] dynamicPlugins:
[01/24/2025-05:21:16] [I] ignoreParsedPluginLibs: 0
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Inference Options ===
[01/24/2025-05:21:16] [I] Batch: Explicit
[01/24/2025-05:21:16] [I] Input inference shapes: model
[01/24/2025-05:21:16] [I] Iterations: 10
[01/24/2025-05:21:16] [I] Duration: 3s (+ 200ms warm up)
[01/24/2025-05:21:16] [I] Sleep time: 0ms
[01/24/2025-05:21:16] [I] Idle time: 0ms
[01/24/2025-05:21:16] [I] Inference Streams: 1
[01/24/2025-05:21:16] [I] ExposeDMA: Disabled
[01/24/2025-05:21:16] [I] Data transfers: Enabled
[01/24/2025-05:21:16] [I] Spin-wait: Disabled
[01/24/2025-05:21:16] [I] Multithreading: Disabled
[01/24/2025-05:21:16] [I] CUDA Graph: Disabled
[01/24/2025-05:21:16] [I] Separate profiling: Disabled
[01/24/2025-05:21:16] [I] Time Deserialize: Disabled
[01/24/2025-05:21:16] [I] Time Refit: Disabled
[01/24/2025-05:21:16] [I] NVTX verbosity: 0
[01/24/2025-05:21:16] [I] Persistent Cache Ratio: 0
[01/24/2025-05:21:16] [I] Optimization Profile Index: 0
[01/24/2025-05:21:16] [I] Weight Streaming Budget: 100.000000%
[01/24/2025-05:21:16] [I] Inputs:
[01/24/2025-05:21:16] [I] Debug Tensor Save Destinations:
[01/24/2025-05:21:16] [I] === Reporting Options ===
[01/24/2025-05:21:16] [I] Verbose: Disabled
[01/24/2025-05:21:16] [I] Averages: 10 inferences
[01/24/2025-05:21:16] [I] Percentiles: 90,95,99
[01/24/2025-05:21:16] [I] Dump refittable layers:Disabled
[01/24/2025-05:21:16] [I] Dump output: Disabled
[01/24/2025-05:21:16] [I] Profile: Disabled
[01/24/2025-05:21:16] [I] Export timing to JSON file:
[01/24/2025-05:21:16] [I] Export output to JSON file:
[01/24/2025-05:21:16] [I] Export profile to JSON file:
[01/24/2025-05:21:16] [I]
[01/24/2025-05:21:16] [I] === Device Information ===
[01/24/2025-05:21:16] [I] Available Devices:
[01/24/2025-05:21:16] [I] Device 0: "NVIDIA A100 80GB PCIe" UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Selected Device: NVIDIA A100 80GB PCIe
[01/24/2025-05:21:17] [I] Selected Device ID: 0
[01/24/2025-05:21:17] [I] Selected Device UUID: GPU-096e51a8-82c0-c28f-256f-1ed51e782f60
[01/24/2025-05:21:17] [I] Compute Capability: 8.0
[01/24/2025-05:21:17] [I] SMs: 108
[01/24/2025-05:21:17] [I] Device Global Memory: 81155 MiB
[01/24/2025-05:21:17] [I] Shared Memory per SM: 164 KiB
[01/24/2025-05:21:17] [I] Memory Bus Width: 5120 bits (ECC enabled)
[01/24/2025-05:21:17] [I] Application Compute Clock Rate: 1.41 GHz
[01/24/2025-05:21:17] [I] Application Memory Clock Rate: 1.512 GHz
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[01/24/2025-05:21:17] [I]
[01/24/2025-05:21:17] [I] TensorRT version: 10.7.0
[01/24/2025-05:21:17] [I] Loading standard plugins
[01/24/2025-05:21:17] [I] [TRT] [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 23, GPU 426 (MiB)
[01/24/2025-05:21:19] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2038, GPU +374, now: CPU 2218, GPU 800 (MiB)
[01/24/2025-05:21:19] [I] Start parsing network model.
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] [TRT] Input filename: model.onnx
[01/24/2025-05:21:19] [I] [TRT] ONNX IR version: 0.0.10
[01/24/2025-05:21:19] [I] [TRT] Opset version: 21
[01/24/2025-05:21:19] [I] [TRT] Producer name: backend-test
[01/24/2025-05:21:19] [I] [TRT] Producer version:
[01/24/2025-05:21:19] [I] [TRT] Domain:
[01/24/2025-05:21:19] [I] [TRT] Model version: 0
[01/24/2025-05:21:19] [I] [TRT] Doc string:
[01/24/2025-05:21:19] [I] [TRT] ----------------------------------------------------------------
[01/24/2025-05:21:19] [I] Finished parsing network model. Parse time: 0.00107378
[01/24/2025-05:21:19] [E] Error[4]: IBuilder::buildSerializedNetwork: Error Code 4: API Usage Error (INormalizationLayer node_of_y: node_of_y: For instance/group normalization, the scale is expected to match the output at the channel dimension 1)
[01/24/2025-05:21:19] [E] Engine could not be created from network
[01/24/2025-05:21:19] [E] Building engine failed
[01/24/2025-05:21:19] [E] Failed to create engine from model or file.
[01/24/2025-05:21:19] [E] Engine set up failed
Have you tried the latest release?: Yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes