-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
Module:PerformanceGeneral performance issuesGeneral performance issuesModule:RuntimeOther generic runtime issues that does not fall into other modulesOther generic runtime issues that does not fall into other modules
Description
Description
I tried to reproduce the FP8 MHA fusion with TensorRT 10.8 but from this example but it seems that the MHA is executed in Half precision from the output logs.
Here are the logs from this command:
trtexec --loadEngine=vit_base_patch8_224_Opset17.engine \
--profilingVerbosity=detailed --dumpLayerInfo --skipInference &> output.log
Is it expected?
MatthieuToulemontMatthieuToulemont
Metadata
Metadata
Assignees
Labels
Module:PerformanceGeneral performance issuesGeneral performance issuesModule:RuntimeOther generic runtime issues that does not fall into other modulesOther generic runtime issues that does not fall into other modules