Skip to content

FP8 Flash Attention SM120+ Errors #194

@sempervictus

Description

@sempervictus

SM121 built with cutlass feature running Qwen3Coder FP8 throws:

vllm-rs-svc0  | 2026-01-20T05:51:19.618284Z  WARN vllm_rs::core::engine: [Stream] New request [Seq_id 0, 2209 tokens] received! (session_id: None)
vllm-rs-svc0  | 
vllm-rs-svc0  | 2026-01-20T05:51:19.618714Z  INFO vllm_rs::core::block_manager: Prefix cache miss seq 0 (2209 tokens)
vllm-rs-svc0  | /root/.cargo/git/checkouts/candle-629ca89aaea24b43/dfa48cd/candle-flash-attn/cutlass/include/cutlass/gemm/kernel/sm90_gemm_array_tma_warpspecialized_cooperative.hpp:422: void cutlass::gemm::kernel::GemmUniversal<ProblemShape_, CollectiveMainloop_, CollectiveEpilogue_, TileScheduler_, std::enable_if<std::is_base_of_v<cutlass::gemm::KernelPtrArrayTmaWarpSpecializedCooperative, CollectiveMainloop_::DispatchPolicy::Schedule>, void>::type>::operator()(const cutlass::gemm::kernel::GemmUniversal<ProblemShape_, CollectiveMainloop_, CollectiveEpilogue_, TileScheduler_, std::enable_if<std::is_base_of_v<cutlass::gemm::KernelPtrArrayTmaWarpSpecializedCooperative, CollectiveMainloop_::DispatchPolicy::Schedule>, void>::type>::Params &, char *) [with ProblemShape_ = cutlass::gemm::GroupProblemShape<cute::tuple<signed int, signed int, signed int>>; CollectiveMainloop_ = cutlass::gemm::collective::CollectiveMma<cutlass::gemm::MainloopSm120ArrayTmaWarpSpecializedBlockwiseScaling<2, 2, cute::tuple<cute::C<1>, cute::C<1>, cute::C<1>>, cutlass::gemm::KernelPtrArrayTmaWarpSpecializedCooperativeBlockwiseScalingSm120<2>>, cute::tuple<cute::C<128>, cute::C<128>, cute::C<128>>, cutlass::float_e4m3_t, cute::tuple<cute::tuple<signed long, cute::C<1>, cute::C<0>> *, cute::Layout<cute::tuple<cute::tuple<cute::C<1>, signed int>, cute::tuple<cute::C<128>, signed int>, signed int>, cute::tuple<cute::tuple<cute::C<0>, cute::C<1>>, cute::tuple<cute::C<0>, signed int>, signed int>> *>, cutlass::float_e4m3_t, cute::tuple<cute::tuple<signed long, cute::C<1>, cute::C<0>> *, cute::Layout<cute::tuple<cute::tuple<cute::C<128>, signed int>, cute::tuple<cute::C<128>, signed int>, signed int>, cute::tuple<cute::tuple<cute::C<0>, cute::C<1>>, cute::tuple<cute::C<0>, signed int>, signed int>> *>, cute::TiledMMA<cute::MMA_Atom<cute::SM120_16x8x32_TN<cutlass::float_e4m3_t, cutlass::float_e4m3_t, float>>, cute::Layout<cute::tuple<cute::C<4>, cute::C<2>, cute::C<1>>, cute::tuple<cute::C<1>, cute::C<4>, cute::C<0>>>, cute::tuple<cute::C<128>, cute::C<32>, cute::C<32>>>, cute::SM90_TMA_LOAD, cute::ComposedLayout<cute::Swizzle<3, 4, 3>, cute::smem_ptr_flag_bits<8>, cute::Layout<cute::tuple<cute::C<8>, cute::C<128>>, cute::tuple<cute::C<128>, cute::C<1>>>>, cute::Copy_Atom<cute::SM75_U32x4_LDSM_N, unsigned char>, cute::identity, cute::SM90_TMA_LOAD, cute::ComposedLayout<cute::Swizzle<3, 4, 3>, cute::smem_ptr_flag_bits<8>, cute::Layout<cute::tuple<cute::C<8>, cute::C<128>>, cute::tuple<cute::C<128>, cute::C<1>>>>, cute::Copy_Atom<cute::SM75_U32x4_LDSM_N, unsigned char>, cute::identity>; CollectiveEpilogue_ = cutlass::epilogue::collective::CollectiveEpilogue<cutlass::epilogue::Sm90PtrArrayTmaWarpSpecialized<2, 2, 4, false, true, 2>, cute::tuple<cute::C<128>, cute::C<128>, cute::C<128>>, cute::tuple<cute::C<64>, cute::C<32>>, void, cute::tuple<signed long, cute::C<1>, cute::C<0>> *, cutlass::bfloat16_t, cute::tuple<signed long, cute::C<1>, cute::C<0>> *, cutlass::epilogue::fusion::FusionCallbacks<cutlass::epilogue::Sm120PtrArrayTmaWarpSpecialized<2, 2, 4, false, true, 2>, cutlass::epilogue::fusion::LinearCombination<cutlass::bfloat16_t, float, void, float, cutlass::FloatRoundStyle::round_to_nearest>, cute::tuple<cute::C<128>, cute::C<128>, cute::C<128>>, cute::tuple<cute::C<64>, cute::C<32>>>, cute::SM90_TMA_LOAD, cute::ComposedLayout<cute::Swizzle<2, 4, 3>, cute::smem_ptr_flag_bits<16>, cute::Layout<cute::tuple<cute::C<8>, cute::C<32>>, cute::tuple<cute::C<32>, cute::C<1>>>>, cute::SM75_U32x2_LDSM_N, cute::SM90_TMA_STORE, cute::ComposedLayout<cute::Swizzle<2, 4, 3>, cute::smem_ptr_flag_bits<16>, cute::Layout<cute::tuple<cute::C<8>, cute::C<32>>, cute::tuple<cute::C<32>, cute::C<1>>>>, cute::SM90_U32x2_STSM_N, cute::Copy_Atom<cute::SM90_U32x2_STSM_N, cutlass::half_t>, void>; TileScheduler_ = void]: block: [41,0,0], thread: [96,0,0] Assertion `0 && "ERROR : Arch conditional MMA instruction used without targeting appropriate compute capability. Aborting.\n"` failed.
vll```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions