Skip to content

MHA FP8 Fusion with TensorRT 10.8 #4516

@david-PHR

Description

@david-PHR

Description

I tried to reproduce the FP8 MHA fusion with TensorRT 10.8 but from this example but it seems that the MHA is executed in Half precision from the output logs.
Here are the logs from this command:

 trtexec --loadEngine=vit_base_patch8_224_Opset17.engine \
--profilingVerbosity=detailed --dumpLayerInfo --skipInference &> output.log

tensorrt_mha_fp8.log

Is it expected?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:PerformanceGeneral performance issuesModule:RuntimeOther generic runtime issues that does not fall into other modules

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions