Skip to content

Add with_output version AppendAttention #3302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Lmywl
Copy link

@Lmywl Lmywl commented Aug 11, 2025

背景:cudagraph 捕获过程中的张量地址管理
目的:将attention模块的输出前置,便于cudagraph捕获时的张量地址处理

Copy link

paddle-bot bot commented Aug 11, 2025

Thanks for your contribution!

self.causal,
self.speculative_method is not None,
)[0]
if self.use_output:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只修改这个文件还不行,全局搜一下所有调用这个append_attention的地方

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

@gongshaotian
Copy link
Collaborator

麻烦再丰富一下PR描述,说明一下改造的背景、目标

@@ -763,3 +1061,64 @@ PD_BUILD_STATIC_OP(append_attention)
.SetKernelFn(PD_KERNEL(AppendAttention))
.SetInferShapeFn(PD_INFER_SHAPE(AppendAttentionInferShape))
.SetInferDtypeFn(PD_INFER_DTYPE(AppendAttentionInferDtype));

PD_BUILD_STATIC_OP(append_attention_with_output)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否有必要新增算子 直接在append_attn算子上改动是不是更清凉 也不会增加算子复杂度

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里考虑的是保留了原有的append_attn算子;vllm中也是保留了两种attention算子https://github.com/vllm-project/vllm/blob/ebf7605b0dd58ff5d572d1918e52ca732025eee0/vllm/attention/layer.py#L238

@@ -85,6 +86,7 @@ def __init__(
head_dim: int,
encoder_block_shape_q: int = -1,
decoder_block_shape_q: int = -1,
use_output: bool = True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里后面改成从config里取吧

Comment on lines +145 to +146


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在这里留个TODO,在CudaGraph子图捕获功能开发完之后把 withoutput 和 原版算子合并,降低编译体积

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants