Precision/behavior failure of TensorRT 10.12 when running mixed FP32/FP16 image restoration on GPU RTX 3080 Laptop

### Description

I am running an image restoration model with TensorRT and I observe noticeable output differences between:

1. A **pure FP32** TensorRT engine, and  
2. An engine built with `--fp16` (and additional layer precision constraints).

Through debugging, I found that **some layers are very sensitive to precision** and must stay in FP32 to match the original PyTorch/ONNX behavior, while other layers are fine with FP16.

My goal is to build a **mixed-precision engine where most layers run in FP32 and only a few selected layers run in FP16**. However, I’m running into some limitations/confusion with `trtexec` and `--layerPrecisions`.

---

### Environment

- TensorRT version: **10.12**  (please replace with your exact version)
- CUDA version: **12.8**
- GPU: **NVIDIA GeForce RTX 3080 Laptop GPU**
- OS: **Windows 11**
- Command line tool: `trtexec.exe`
- Model: image restoration network (BasicVSR++-style), exported to ONNX

---

### What I am doing

1. **Baseline FP32 engine**

   I first build a pure FP32 engine (no `--fp16`) and use it as my reference:

   trtexec.exe \
     --onnx="path/to/model.onnx" \
     --saveEngine="model_fp32.trt" \
     --noTF32 \
     --profilingVerbosity=detailed
The outputs of this engine match my PyTorch FP32 baseline reasonably well.

2. FP16 engine

Then I build an FP16 engine:
trtexec.exe \
  --onnx="path/to/model.onnx" \
  --saveEngine="model_fp16.trt" \
  --fp16 \
  --noTF32 \
  --profilingVerbosity=detailed

For my image restoration task, the FP16 engine’s outputs show visible differences compared to the FP32 engine.

3. Trying to force most layers to FP32 with --layerPrecisions

I tried to keep global FP16 enabled (to allow FP16 kernels) but force all layers to FP32 using:

trtexec.exe \
  --onnx="path/to/model.onnx" \
  --saveEngine="model_mixed.trt" \
  --fp16 \
  --precisionConstraints=obey \
  --layerPrecisions="*:fp32" \
  --layerOutputTypes="*:fp32" \
  --noTF32 \
  --profilingVerbosity=detailed

However, the outputs of this “mixed” engine are still different from the pure FP32 engine.

At the same time:

If I do not pass --fp16, trtexec does not allow me to set specific layers to FP16 using --layerPrecisions=<layerName>:fp16 (which I understand, because FP16 is not globally enabled).
So I am stuck between:
Enabling --fp16 → but then even with --layerPrecisions=*:fp32 I cannot reproduce the FP32 behavior, or
Disabling --fp16 → then I cannot make only a few layers FP16.

Question
My questions are:

Is there an officially supported way to build an engine where:

the default precision for all layers is FP32, and
only a small set of explicitly listed layers run in FP16,
using trtexec only (no C++/Python builder API)?
In other words, something conceptually like:

“global FP32 engine, but allow FP16 kernels, and only these layers use FP16”, e.g.
--layerPrecisions=/reconstruction/main/main.2/main.2.0/conv1/Conv:fp16,...
For the current behavior:

With --fp16 and --layerPrecisions=*:fp32, should I expect the outputs to be numerically very close to a pure FP32 engine?
If yes, is --layerPrecisions="*:fp32" a valid and supported syntax in TensorRT 10.2?
Are there internal layers (e.g. reformat, shuffle, concatenation, etc.) or tactics that may still use FP16/TF32 even if all visible ONNX layers are set to FP32 via --layerPrecisions / --layerOutputTypes?
Is the following strategy recommended / supported for my use case:

Build with --fp16 and --precisionConstraints=obey
Enumerate all real layer names from --dumpLayerInfo (e.g. Name: /reconstruction/main/...)
Programmatically generate a long string like
--layerPrecisions=/layer1:fp32,/layer2:fp32,...
--layerOutputTypes=/layer1:fp32,/layer2:fp32,...
Then only remove :fp32 for a few selected layers and change them to :fp16.
Would you expect this to give results extremely close to the pure FP32 engine (except for those selected FP16 layers)?


Why I care
For my image restoration model, small numeric differences can become visually noticeable in the output images. I have identified (by experiment) that:

Some sensitive layers must stay in FP32 to keep the output quality, while
Many other layers can safely run in FP16 for performance reasons.
So I would like to:

Have a clear, documented way to:
start from a “FP32-by-default” engine, and
selectively downgrade only a few layers to FP16,
Using either trtexec command-line options or a recommended workflow.
Any guidance, clarification on the expected behavior of --fp16 + --layerPrecisions, or examples of the correct way to achieve this mixed-precision setup would be greatly appreciated.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Precision/behavior failure of TensorRT 10.12 when running mixed FP32/FP16 image restoration on GPU RTX 3080 Laptop #4645

Description

Environment

What I am doing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Precision/behavior failure of TensorRT 10.12 when running mixed FP32/FP16 image restoration on GPU RTX 3080 Laptop #4645

Description

Description

Environment

What I am doing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions