[Executorch][llm] Make mask tensor float only for sdpa

kimishpatel · kimishpatel · commit 09d57aa6418a · 2025-07-01T09:38:47.000-07:00
Now that we support quantized sdpa query tensor can be quantized and attention mask can be float (the only type allowed). So this check doesnt make sense anymore. Differential Revision: [D77516821](https://our.internmc.facebook.com/intern/diff/D77516821/) ghstack-source-id: 293661338 Pull Request resolved: #12131
diff --git a/extension/llm/custom_ops/op_sdpa.cpp b/extension/llm/custom_ops/op_sdpa.cpp
@@ -59,8 +59,8 @@ bool validate_flash_attention_args(
 
   ET_CHECK_OR_RETURN_FALSE(
       !attn_mask.has_value() ||
-          attn_mask.value().scalar_type() == query.scalar_type(),
-      "Attention mask must be a 2D tensor");
+          attn_mask.value().scalar_type() == ScalarType::Float,
+      "Attention mask must be a Float tensor");
 
   ET_CHECK_OR_RETURN_FALSE(
       is_contiguous_dim_order(query.dim_order().data(), query.dim()),