-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
I am running an image restoration model with TensorRT and I observe noticeable output differences between:
- A pure FP32 TensorRT engine, and
- An engine built with
--fp16(and additional layer precision constraints).
Through debugging, I found that some layers are very sensitive to precision and must stay in FP32 to match the original PyTorch/ONNX behavior, while other layers are fine with FP16.
My goal is to build a mixed-precision engine where most layers run in FP32 and only a few selected layers run in FP16. However, I’m running into some limitations/confusion with trtexec and --layerPrecisions.
Environment
- TensorRT version: 10.12 (please replace with your exact version)
- CUDA version: 12.8
- GPU: NVIDIA GeForce RTX 3080 Laptop GPU
- OS: Windows 11
- Command line tool:
trtexec.exe - Model: image restoration network (BasicVSR++-style), exported to ONNX
What I am doing
-
Baseline FP32 engine
I first build a pure FP32 engine (no
--fp16) and use it as my reference:trtexec.exe
--onnx="path/to/model.onnx"
--saveEngine="model_fp32.trt"
--noTF32
--profilingVerbosity=detailed
The outputs of this engine match my PyTorch FP32 baseline reasonably well. -
FP16 engine
Then I build an FP16 engine:
trtexec.exe
--onnx="path/to/model.onnx"
--saveEngine="model_fp16.trt"
--fp16
--noTF32
--profilingVerbosity=detailed
For my image restoration task, the FP16 engine’s outputs show visible differences compared to the FP32 engine.
- Trying to force most layers to FP32 with --layerPrecisions
I tried to keep global FP16 enabled (to allow FP16 kernels) but force all layers to FP32 using:
trtexec.exe
--onnx="path/to/model.onnx"
--saveEngine="model_mixed.trt"
--fp16
--precisionConstraints=obey
--layerPrecisions=":fp32"
--layerOutputTypes=":fp32"
--noTF32
--profilingVerbosity=detailed
However, the outputs of this “mixed” engine are still different from the pure FP32 engine.
At the same time:
If I do not pass --fp16, trtexec does not allow me to set specific layers to FP16 using --layerPrecisions=:fp16 (which I understand, because FP16 is not globally enabled).
So I am stuck between:
Enabling --fp16 → but then even with --layerPrecisions=*:fp32 I cannot reproduce the FP32 behavior, or
Disabling --fp16 → then I cannot make only a few layers FP16.
Question
My questions are:
Is there an officially supported way to build an engine where:
the default precision for all layers is FP32, and
only a small set of explicitly listed layers run in FP16,
using trtexec only (no C++/Python builder API)?
In other words, something conceptually like:
“global FP32 engine, but allow FP16 kernels, and only these layers use FP16”, e.g.
--layerPrecisions=/reconstruction/main/main.2/main.2.0/conv1/Conv:fp16,...
For the current behavior:
With --fp16 and --layerPrecisions=:fp32, should I expect the outputs to be numerically very close to a pure FP32 engine?
If yes, is --layerPrecisions=":fp32" a valid and supported syntax in TensorRT 10.2?
Are there internal layers (e.g. reformat, shuffle, concatenation, etc.) or tactics that may still use FP16/TF32 even if all visible ONNX layers are set to FP32 via --layerPrecisions / --layerOutputTypes?
Is the following strategy recommended / supported for my use case:
Build with --fp16 and --precisionConstraints=obey
Enumerate all real layer names from --dumpLayerInfo (e.g. Name: /reconstruction/main/...)
Programmatically generate a long string like
--layerPrecisions=/layer1:fp32,/layer2:fp32,...
--layerOutputTypes=/layer1:fp32,/layer2:fp32,...
Then only remove :fp32 for a few selected layers and change them to :fp16.
Would you expect this to give results extremely close to the pure FP32 engine (except for those selected FP16 layers)?
Why I care
For my image restoration model, small numeric differences can become visually noticeable in the output images. I have identified (by experiment) that:
Some sensitive layers must stay in FP32 to keep the output quality, while
Many other layers can safely run in FP16 for performance reasons.
So I would like to:
Have a clear, documented way to:
start from a “FP32-by-default” engine, and
selectively downgrade only a few layers to FP16,
Using either trtexec command-line options or a recommended workflow.
Any guidance, clarification on the expected behavior of --fp16 + --layerPrecisions, or examples of the correct way to achieve this mixed-precision setup would be greatly appreciated.
Thank you!