How to replace the attention manchism in the transformconv with flashattn or causal attn? #7457

HelloWorldLTY · 2023-05-29T12:40:01Z

HelloWorldLTY
May 29, 2023

Hi, I think there should be approaches to optimize the attention calculation in TransformConv codes. How to replace it with multihead attention provided by pytorch with acclecration of flash attn or functional approach like causal attn? Thank a lot.

HelloWorldLTY · 2023-05-29T12:40:17Z

HelloWorldLTY
May 29, 2023
Author

I tried but still got different output.

1 reply

rusty1s May 30, 2023
Maintainer

That's a bit tricky because PyTorch expects dense input tensors of the form [batch_size, num_nodes, num_features], while PyG supports varying number of nodes. The only way I can think of right now to accelerate is via calling MultiheadAttention on a nested tensor (but this doesn't seem to have a backward pass right now, see https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to replace the attention manchism in the transformconv with flashattn or causal attn? #7457

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to replace the attention manchism in the transformconv with flashattn or causal attn? #7457

Uh oh!

HelloWorldLTY May 29, 2023

Replies: 1 comment · 1 reply

Uh oh!

HelloWorldLTY May 29, 2023 Author

Uh oh!

rusty1s May 30, 2023 Maintainer

HelloWorldLTY
May 29, 2023

Replies: 1 comment 1 reply

HelloWorldLTY
May 29, 2023
Author

rusty1s May 30, 2023
Maintainer