[CPU] Support dynamic attention by tiling K1 when needed. #23304

hanhanW · 2026-01-27T23:55:53Z

The K1 dimension (head_dim) in attention was unconditionally left untiled, which leads large stack allocation when the dimension is dynamic.

K1 is typically small (64/128 per AttentionOpDetail docs), so the original heuristic to leave it untiled was reasonable. The revision sets the tile sizes if the dimension is dynamic or it is not within typical range (<= 128).

E2E tests are added, and they have the same inputs and expected outputs like attention.mlir (which is a static version). Some backends, e.g., AMDGPU, does not support dynamic attention, so we create a new file. The test is enabled on CPU and VMVX backends in the revision.

Fixes #23277

The K1 dimension (head_dim) in attention was unconditionally left untiled, which leads large stack allocation when the dimension is dynamic. K1 is typically small (64/128 per AttentionOpDetail docs), so the original heuristic to leave it untiled was reasonable. The revision sets the tile sizes if the dimension is dynamic or it is not within typical range (<= 128). An e2e test is added. Signed-off-by: hanhanW <[email protected]>

Groverkss · 2026-01-28T16:53:31Z

compiler/src/iree/compiler/Codegen/LLVMCPU/KernelDispatch.cpp

-  // Due to the way attention works, K1 dimensions cannot be tiled. Mark k1
-  // reduction dimensions not to distribute.


I wonder what I was thinking when i wrote this comment, ofcourse you can serially tile it.

amd-eochoalo · 2026-01-28T19:36:33Z

@hanhanW this PR produces a build error while generating tests for android:

FAILED: [code=1] tests/e2e/linalg_ext_ops/check_llvm-cpu_local-task_dynamic_attention.mlir_module.vmfb /home/runner/work/iree/iree/build-android-arm_64/tests/e2e/linalg_ext_ops/check_llvm-cpu_local-task_dynamic_attention.mlir_module.vmfb 
cd /home/runner/work/iree/iree/build-android-arm_64/tests/e2e/linalg_ext_ops && /home/runner/work/iree/iree/.venv/bin/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=generic /home/runner/work/iree/iree/tests/e2e/linalg_ext_ops/dynamic_attention.mlir -o check_llvm-cpu_local-task_dynamic_attention.mlir_module.vmfb --iree-hal-executable-object-search-path=\"/home/runner/work/iree/iree/build-android-arm_64\" --iree-llvmcpu-target-triple=aarch64-none-linux-android29
/home/runner/work/iree/iree/tests/e2e/linalg_ext_ops/dynamic_attention.mlir:26:13: error: Yield operand #1 is not equivalent to the corresponding iter bbArg
  %result = iree_linalg_ext.attention {
            ^
/home/runner/work/iree/iree/tests/e2e/linalg_ext_ops/dynamic_attention.mlir:26:13: error: Yield operand #1 is not equivalent to the corresponding iter bbArg
  %result = iree_linalg_ext.attention {
            ^
/home/runner/work/iree/iree/tests/e2e/linalg_ext_ops/dynamic_attention.mlir:26:13: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"llvm-cpu", "embedded-elf-arm_64", {cpu = "generic", cpu_features = "+reserve-x18", data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32", iree.encoding.resolver = #iree_cpu.cpu_encoding_resolver<>, max_stack_allocation_size = 32768 : i64, native_vector_size = 16 : i64, target_triple = "aarch64-unknown-unknown-eabi-elf"}>
  %result = iree_linalg_ext.attention {
            ^

hanhanW · 2026-01-28T19:41:02Z

It looks like it triggers a bug, looking. Let's revert it for now.

…3304)" This reverts commit 761bd9c.

…23313) Reverts #23304 It triggers a bug on android build. To repro: `iree-compile --output-format=vm-bytecode --iree-hal-target-device=local --iree-hal-local-target-device-backends=llvm-cpu --iree-llvmcpu-target-cpu=generic --iree-llvmcpu-target-triple=aarch64-none-linux-android29 tests/e2e/linalg_ext_ops/dynamic_attention.mlir`

hanhanW · 2026-01-29T00:35:53Z

The issue happens when masking is disabled. It routes back to the other old issue: #16956

I have a local fix and I'm polishing it.

hanhanW · 2026-01-29T01:50:41Z

#23318 fixes the issue. I'll re-land the PR once the other change is landed.

hanhanW requested a review from Groverkss January 27, 2026 23:55

hanhanW requested a review from bjacob as a code owner January 27, 2026 23:55

hanhanW requested a review from MaheshRavishankar January 28, 2026 00:05

Groverkss reviewed Jan 28, 2026

View reviewed changes

Groverkss approved these changes Jan 28, 2026

View reviewed changes

bjacob approved these changes Jan 28, 2026

View reviewed changes

hanhanW merged commit 761bd9c into main Jan 28, 2026
59 of 62 checks passed

hanhanW deleted the users/hanhanW/cpu-dynamic-attention branch January 28, 2026 17:56

hanhanW added a commit that referenced this pull request Jan 28, 2026

Revert "[CPU] Support dynamic attention by tiling K1 when needed. (#2…

2d02525

…3304)" This reverts commit 761bd9c.

hanhanW mentioned this pull request Jan 28, 2026

Revert "[CPU] Support dynamic attention by tiling K1 when needed." #23313

Merged

hanhanW mentioned this pull request Jan 30, 2026

Compiler crash in LLVMCPUSelectLoweringStrategy with dynamic-shape iree_linalg_ext.attention #23277

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Support dynamic attention by tiling K1 when needed. #23304

[CPU] Support dynamic attention by tiling K1 when needed. #23304

Uh oh!

hanhanW commented Jan 27, 2026

Uh oh!

Groverkss Jan 28, 2026

Uh oh!

Uh oh!

amd-eochoalo commented Jan 28, 2026

Uh oh!

hanhanW commented Jan 28, 2026

Uh oh!

hanhanW commented Jan 29, 2026

Uh oh!

hanhanW commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		// Due to the way attention works, K1 dimensions cannot be tiled. Mark k1
		// reduction dimensions not to distribute.

[CPU] Support dynamic attention by tiling K1 when needed. #23304

[CPU] Support dynamic attention by tiling K1 when needed. #23304

Uh oh!

Conversation

hanhanW commented Jan 27, 2026

Uh oh!

Groverkss Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amd-eochoalo commented Jan 28, 2026

Uh oh!

hanhanW commented Jan 28, 2026

Uh oh!

hanhanW commented Jan 29, 2026

Uh oh!

hanhanW commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants