[webgpu] Fix the wrong fallback in Attention (#26608)

qjia7 · web-flow · commit 81a04ca45d5c · 2025-11-20T08:59:16.000+08:00
Attention input handling updates:

* Corrected the input indices for `past` from `input[5]` to `input[4]`
in the fallback logic, ensuring the code reflects the actual input
order.

With this change, the Attention ops in phi-4-mm-vision.onnx can go to
the gpu instead of cpu.
diff --git a/onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc b/onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc
@@ -878,9 +878,9 @@ std::vector<std::unique_ptr<ComputeCapability>> WebGpuExecutionProvider::GetCapa
       const auto& inputs = node.InputDefs();
       const auto& outputs = node.OutputDefs();
 
-      // Current implementation does not support mask_index(input[3]), past(input[5]) and past_seq_len(input[6])
+      // Current implementation does not support mask_index(input[3]), past(input[4]) and past_seq_len(input[6])
       FALLBACK_TO_CPU_IF_EXIST_INPUT(3);
-      FALLBACK_TO_CPU_IF_EXIST_INPUT(5);
+      FALLBACK_TO_CPU_IF_EXIST_INPUT(4);
       FALLBACK_TO_CPU_IF_EXIST_INPUT(6);
 
       // Current implementation does not support present(output[1])