Skip to content

Commit 81a04ca

Browse files
authored
[webgpu] Fix the wrong fallback in Attention (#26608)
Attention input handling updates: * Corrected the input indices for `past` from `input[5]` to `input[4]` in the fallback logic, ensuring the code reflects the actual input order. With this change, the Attention ops in phi-4-mm-vision.onnx can go to the gpu instead of cpu.
1 parent e6023b0 commit 81a04ca

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -878,9 +878,9 @@ std::vector<std::unique_ptr<ComputeCapability>> WebGpuExecutionProvider::GetCapa
878878
const auto& inputs = node.InputDefs();
879879
const auto& outputs = node.OutputDefs();
880880

881-
// Current implementation does not support mask_index(input[3]), past(input[5]) and past_seq_len(input[6])
881+
// Current implementation does not support mask_index(input[3]), past(input[4]) and past_seq_len(input[6])
882882
FALLBACK_TO_CPU_IF_EXIST_INPUT(3);
883-
FALLBACK_TO_CPU_IF_EXIST_INPUT(5);
883+
FALLBACK_TO_CPU_IF_EXIST_INPUT(4);
884884
FALLBACK_TO_CPU_IF_EXIST_INPUT(6);
885885

886886
// Current implementation does not support present(output[1])

0 commit comments

Comments
 (0)