Skip to content

Precision/behavior failure of TensorRT 10.12 when running mixed FP32/FP16 image restoration on GPU RTX 3080 Laptop #4645

@rushdawn

Description

@rushdawn

Description

I am running an image restoration model with TensorRT and I observe noticeable output differences between:

  1. A pure FP32 TensorRT engine, and
  2. An engine built with --fp16 (and additional layer precision constraints).

Through debugging, I found that some layers are very sensitive to precision and must stay in FP32 to match the original PyTorch/ONNX behavior, while other layers are fine with FP16.

My goal is to build a mixed-precision engine where most layers run in FP32 and only a few selected layers run in FP16. However, I’m running into some limitations/confusion with trtexec and --layerPrecisions.


Environment

  • TensorRT version: 10.12 (please replace with your exact version)
  • CUDA version: 12.8
  • GPU: NVIDIA GeForce RTX 3080 Laptop GPU
  • OS: Windows 11
  • Command line tool: trtexec.exe
  • Model: image restoration network (BasicVSR++-style), exported to ONNX

What I am doing

  1. Baseline FP32 engine

    I first build a pure FP32 engine (no --fp16) and use it as my reference:

    trtexec.exe
    --onnx="path/to/model.onnx"
    --saveEngine="model_fp32.trt"
    --noTF32
    --profilingVerbosity=detailed
    The outputs of this engine match my PyTorch FP32 baseline reasonably well.

  2. FP16 engine

Then I build an FP16 engine:
trtexec.exe
--onnx="path/to/model.onnx"
--saveEngine="model_fp16.trt"
--fp16
--noTF32
--profilingVerbosity=detailed

For my image restoration task, the FP16 engine’s outputs show visible differences compared to the FP32 engine.

  1. Trying to force most layers to FP32 with --layerPrecisions

I tried to keep global FP16 enabled (to allow FP16 kernels) but force all layers to FP32 using:

trtexec.exe
--onnx="path/to/model.onnx"
--saveEngine="model_mixed.trt"
--fp16
--precisionConstraints=obey
--layerPrecisions=":fp32"
--layerOutputTypes="
:fp32"
--noTF32
--profilingVerbosity=detailed

However, the outputs of this “mixed” engine are still different from the pure FP32 engine.

At the same time:

If I do not pass --fp16, trtexec does not allow me to set specific layers to FP16 using --layerPrecisions=:fp16 (which I understand, because FP16 is not globally enabled).
So I am stuck between:
Enabling --fp16 → but then even with --layerPrecisions=*:fp32 I cannot reproduce the FP32 behavior, or
Disabling --fp16 → then I cannot make only a few layers FP16.

Question
My questions are:

Is there an officially supported way to build an engine where:

the default precision for all layers is FP32, and
only a small set of explicitly listed layers run in FP16,
using trtexec only (no C++/Python builder API)?
In other words, something conceptually like:

“global FP32 engine, but allow FP16 kernels, and only these layers use FP16”, e.g.
--layerPrecisions=/reconstruction/main/main.2/main.2.0/conv1/Conv:fp16,...
For the current behavior:

With --fp16 and --layerPrecisions=:fp32, should I expect the outputs to be numerically very close to a pure FP32 engine?
If yes, is --layerPrecisions="
:fp32" a valid and supported syntax in TensorRT 10.2?
Are there internal layers (e.g. reformat, shuffle, concatenation, etc.) or tactics that may still use FP16/TF32 even if all visible ONNX layers are set to FP32 via --layerPrecisions / --layerOutputTypes?
Is the following strategy recommended / supported for my use case:

Build with --fp16 and --precisionConstraints=obey
Enumerate all real layer names from --dumpLayerInfo (e.g. Name: /reconstruction/main/...)
Programmatically generate a long string like
--layerPrecisions=/layer1:fp32,/layer2:fp32,...
--layerOutputTypes=/layer1:fp32,/layer2:fp32,...
Then only remove :fp32 for a few selected layers and change them to :fp16.
Would you expect this to give results extremely close to the pure FP32 engine (except for those selected FP16 layers)?

Why I care
For my image restoration model, small numeric differences can become visually noticeable in the output images. I have identified (by experiment) that:

Some sensitive layers must stay in FP32 to keep the output quality, while
Many other layers can safely run in FP16 for performance reasons.
So I would like to:

Have a clear, documented way to:
start from a “FP32-by-default” engine, and
selectively downgrade only a few layers to FP16,
Using either trtexec command-line options or a recommended workflow.
Any guidance, clarification on the expected behavior of --fp16 + --layerPrecisions, or examples of the correct way to achieve this mixed-precision setup would be greatly appreciated.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions