MHA FP8 Fusion with TensorRT 10.8

## Description

I tried to reproduce the FP8 MHA fusion with TensorRT 10.8 but from this [example](https://docs.nvidia.com/deeplearning/tensorrt/10.8.0/performance/best-practices.html#example-workflow-fp8-mha-fusion) but it seems that the MHA is executed in Half precision from the output logs. 
Here are the logs from this command:
```
 trtexec --loadEngine=vit_base_patch8_224_Opset17.engine \
--profilingVerbosity=detailed --dumpLayerInfo --skipInference &> output.log
```

[tensorrt_mha_fp8.log](https://github.com/user-attachments/files/21147460/tensorrt_mha_fp8.log)

Is it expected?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MHA FP8 Fusion with TensorRT 10.8 #4516

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MHA FP8 Fusion with TensorRT 10.8 #4516

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions