Support for Attention Mask in NATTAN (Neighborhood Attention Transformer with Progressive Channel Fusion)

Description:
While applying NATTAN to speaker verification tasks, I found that using surrounding padding improves performance(Neighborhood Attention Transformer with Progressive Channel Fusion).
To implement this properly, it is necessary to add -inf to the corresponding positions before applying softmax, effectively masking the padded positions.

Currently, I am handling this by separating qk and av instead of using the fused kernel. However, this approach increases memory consumption.

Feature Request:

Support for attention mask (specifically adding -inf to padded positions).

Ideally, support attention masking within the fused kernel implementation as well.

1d implemention of paper is available on https://github.com/ChenNan1996/PCF-NAT

Question:
Do you have any plans to implement attention mask support in the future?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Attention Mask in NATTAN (Neighborhood Attention Transformer with Progressive Channel Fusion) #263

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Attention Mask in NATTAN (Neighborhood Attention Transformer with Progressive Channel Fusion) #263

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions