Unable to use FP8 convolution kernels with TensorRT 10.12.0.36 when running FP8 models on NVIDIA H100

Hello,

I was testing FP8 inference on convolution layers using Torch-TensorRT with TensorRT 10.12.0.36 on an H100 GPU.
Even though the model structure meets FP8 convolution requirements (channels multiple of 32), TensorRT still didn't select FP8 convolution kernels, while FP8 works fine for Linear layers in the same environment.

Environment:
- TensorRT 10.12.0.36
- Torch-TensorRT 2.8.0
- CUDA: 12.6

Here is the Python script:
[fp8conv.py](https://github.com/user-attachments/files/21993559/fp8conv.py)

Tensorboard shows that there are no TensorCores being used.

<img width="2117" height="893" alt="Image" src="https://github.com/user-attachments/assets/22dc0394-81c0-405b-acca-5e3a7c9ccb08" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to use FP8 convolution kernels with TensorRT 10.12.0.36 when running FP8 models on NVIDIA H100 #4560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to use FP8 convolution kernels with TensorRT 10.12.0.36 when running FP8 models on NVIDIA H100 #4560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions