Skip to content

[QST] cutlass profiler -DCUTLASS_LIBRARY_KERNELS=cutlass3x*f32xe4m3_*f32xe4m3* but for fp16/bf16 scales? #2925

@aidando73

Description

@aidando73

RE: https://github.com/NVIDIA/cutlass/tree/main/examples/81_blackwell_gemm_blockwise#kernel-selection-and-profiling

I want to profile all blockwise kernels that support e4m3 scaled with fp16/bf16. E.g.,

cmake $FIREWORKS_DIR/flashinfer/3rdparty/cutlass \
-DCMAKE_BUILD_TYPE=Release \
-DCUTLASS_LIBRARY_KERNELS="cutlass3x*f16xe4m3_*f16xe4m3*,cutlass3x*f16xf8_*f16xf8*" \
-DCUTLASS_NVCC_ARCHS="100a" \
-DCUTLASS_LIBRARY_INSTANTIATION_LEVEL="max" \
-DCUTLASS_UNITY_BUILD_ENABLED=ON

But this doesn't seem to generate any kernels. These are the logs:

^[[A-- The C compiler identification is GNU 13.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found Python3: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter
-- Configuring cublas ...
-- cuBLAS Disabled.
-- Configuring cuBLAS ... done.
-- Generating /shared/data2/cutlass-build/tools/library/cutlass_library_objs.unity.0d5bb3ba97a2.cu
-- Generating /shared/data2/cutlass-build/tools/library/cutlass_library_objs.unity.2517cc6c388e.cu
-- Generating /shared/data2/cutlass-build/tools/library/cutlass_library_objs.unity.2ddb221102ee.cu
-- Completed generation of library instances. See /shared/data2/cutlass-build/tools/library/library_instance_generation.log for more information.
-- Found Python3: /usr/bin/python3.12 (found suitable version "3.12.3", minimum required is "3.5") found components: Interpreter
-- Generating /shared/data2/cutlass-build/tools/profiler/cutlass_profiler.unity.1f651d759988.cu
-- Generating /shared/data2/cutlass-build/tools/profiler/cutlass_profiler.unity.11cb496d74c1.cu
-- Enable device reference verification in conv unit tests
-- Generating /shared/data2/cutlass-build/test/unit/conv/device/cutlass_test_unit_conv_device_simt.unity.0d84bd511b07.cu
-- Generating /shared/data2/cutlass-build/test/unit/conv/device/cutlass_test_unit_conv_device_tensorop_s32.unity.232fdc355d9a.cu
-- Generating /shared/data2/cutlass-build/test/unit/conv/device/cutlass_test_unit_conv_device_tensorop_s32_interleaved.unity.c1d226553156.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.cb57df17b268.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.666008298a32.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.b5eae352ba84.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.85f9a268eeb5.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.12b2fc38db24.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.bba2b4d3885b.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.7045c36c97bc.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.ebc324739c8f.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.e6170f5dd982.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.66b59cac99ee.cu
-- Generating /shared/data2/cutlass-build/test/self_contained_includes/test_self_contained_includes.unity.93734a6e108c.cu
-- Configuring done (77.9s)
-- Generating done (0.5s)
-- Build files have been written to: /shared/data2/cutlass-build

Whereas for cutlass3x*f32xe4m3_*f32xe4m3*,cutlass3x*f32xf8_*f32xf8* - it's working

-- Generating /home/aidando/fireworks/flashinfer/3rdparty/cutlass/build/tools/library/cutlass_library_gemm_sm100_gemm_f32xe4m3_f32xe4m3_objs.unity.bc19166c1270.cu
-- Generating /home/aidando/fireworks/flashinfer/3rdparty/cutlass/build/tools/library/cutlass_library_gemm_sm100_gemm_f32xe4m3_f32xe4m3_objs.unity.0be5a06bb241.cu
-- Generating /home/aidando/fireworks/flashinfer/3rdparty/cutlass/build/tools/library/cutlass_library_gemm_sm100_gemm_f32xe4m3_f32xe4m3_objs.unity.921d582bd6f1.cu
-- Generating /home/aidando/fireworks/flashinfer/3rdparty/cutlass/build/tools/library/cutlass_library_gemm_sm100_gemm_f32xe4m3_f32xe4m3_objs.unity.42ede50788f1.cu
-- Generating
...

Am I doing anything wrong - if not, are there any workarounds? cc @depaulmillz

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions