Add with_output version AppendAttention #3302

Lmywl · 2025-08-11T03:20:55Z

背景：cudagraph 捕获过程中的张量地址管理
目的：将attention模块的输出前置，便于cudagraph捕获时的张量地址处理

paddle-bot · 2025-08-11T03:21:01Z

Thanks for your contribution!

custom_ops/gpu_ops/append_attention.cu

yuanlehome · 2025-08-11T07:06:44Z

fastdeploy/model_executor/layers/attention/append_attn_backend.py

-            self.causal,
-            self.speculative_method is not None,
-        )[0]
+        if self.use_output:


只修改这个文件还不行，全局搜一下所有调用这个append_attention的地方

fastdeploy/model_executor/layers/attention/ops/append_attention.py

fastdeploy/model_executor/layers/attention/append_attn_backend.py

gongshaotian · 2025-08-11T07:16:21Z

麻烦再丰富一下PR描述，说明一下改造的背景、目标

test/layers/test_append_attention.py

lizhenyun01 · 2025-08-12T05:35:23Z

custom_ops/gpu_ops/append_attention.cu

@@ -763,3 +1061,64 @@ PD_BUILD_STATIC_OP(append_attention)
    .SetKernelFn(PD_KERNEL(AppendAttention))
    .SetInferShapeFn(PD_INFER_SHAPE(AppendAttentionInferShape))
    .SetInferDtypeFn(PD_INFER_DTYPE(AppendAttentionInferDtype));
+
+PD_BUILD_STATIC_OP(append_attention_with_output)


是否有必要新增算子直接在append_attn算子上改动是不是更清凉也不会增加算子复杂度

这里考虑的是保留了原有的append_attn算子；vllm中也是保留了两种attention算子https://github.com/vllm-project/vllm/blob/ebf7605b0dd58ff5d572d1918e52ca732025eee0/vllm/attention/layer.py#L238

gongshaotian · 2025-08-12T06:51:55Z

fastdeploy/model_executor/layers/attention/append_attn_backend.py

@@ -85,6 +86,7 @@ def __init__(
        head_dim: int,
        encoder_block_shape_q: int = -1,
        decoder_block_shape_q: int = -1,
+        use_output: bool = True,


这里后面改成从config里取吧

gongshaotian · 2025-08-12T07:11:20Z

fastdeploy/model_executor/layers/attention/ops/append_attention.py

+
+

在这里留个TODO，在CudaGraph子图捕获功能开发完之后把 withoutput 和原版算子合并，降低编译体积

gongshaotian requested review from yuanlehome and zhoutianzi666 August 11, 2025 06:58

gongshaotian reviewed Aug 11, 2025

View reviewed changes

custom_ops/gpu_ops/append_attention.cu Show resolved Hide resolved

yuanlehome reviewed Aug 11, 2025

View reviewed changes

gongshaotian reviewed Aug 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/ops/append_attention.py Outdated Show resolved Hide resolved

gongshaotian reviewed Aug 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Show resolved Hide resolved

fastdeploy/model_executor/layers/attention/append_attn_backend.py Show resolved Hide resolved

Lmywl force-pushed the append_attn_pr branch from 68d80b1 to ecbc5fb Compare August 11, 2025 07:37

gongshaotian reviewed Aug 11, 2025

View reviewed changes

test/layers/test_append_attention.py Outdated Show resolved Hide resolved

Lmywl force-pushed the append_attn_pr branch from ecbc5fb to b977c0e Compare August 11, 2025 10:54

lizhenyun01 reviewed Aug 12, 2025

View reviewed changes

gongshaotian reviewed Aug 12, 2025

View reviewed changes

get use_output from fd_config

8572b8a

Lmywl force-pushed the append_attn_pr branch from 01a0957 to 8572b8a Compare August 12, 2025 07:31

add clear TODO description

19109e4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add with_output version AppendAttention #3302

Add with_output version AppendAttention #3302

Lmywl commented Aug 11, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 11, 2025

Uh oh!

Uh oh!

yuanlehome Aug 11, 2025

Uh oh!

Lmywl Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gongshaotian commented Aug 11, 2025

Uh oh!

Uh oh!

lizhenyun01 Aug 12, 2025

Uh oh!

Lmywl Aug 12, 2025

Uh oh!

gongshaotian Aug 12, 2025

Uh oh!

gongshaotian Aug 12, 2025

Uh oh!

Uh oh!

Add with_output version AppendAttention #3302

Are you sure you want to change the base?

Add with_output version AppendAttention #3302

Conversation

Lmywl commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 11, 2025

Uh oh!

Uh oh!

yuanlehome Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Lmywl Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gongshaotian commented Aug 11, 2025

Uh oh!

Uh oh!

lizhenyun01 Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Lmywl Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Lmywl commented Aug 11, 2025 •

edited

Loading