Skip to content

[Usage]: Is there any fp8_blockscale_gemm performance comparison data between nvcc and nvrtc? #10307

@zzJingHong

Description

@zzJingHong

Is there any fp8_blockscale_gemm performance comparison data between nvcc and nvrtc?

"Note that there is some perf drop when using NVRTC due to a known bug of NVRTC which leads to extra instructions (but in the m=4096,n=2112,k=7168 case, NVRTC version was faster, which was a bit strange)" From Deepgemm。

Has this bug been fixed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    General perf<NV>Broad performance issues not specific to a particular componentquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions