Skip to content

Support for Attention Mask in NATTAN (Neighborhood Attention Transformer with Progressive Channel Fusion) #263

@roadroller0501

Description

@roadroller0501

Description:
While applying NATTAN to speaker verification tasks, I found that using surrounding padding improves performance(Neighborhood Attention Transformer with Progressive Channel Fusion).
To implement this properly, it is necessary to add -inf to the corresponding positions before applying softmax, effectively masking the padded positions.

Currently, I am handling this by separating qk and av instead of using the fused kernel. However, this approach increases memory consumption.

Feature Request:

Support for attention mask (specifically adding -inf to padded positions).

Ideally, support attention masking within the fused kernel implementation as well.

1d implemention of paper is available on https://github.com/ChenNan1996/PCF-NAT

Question:
Do you have any plans to implement attention mask support in the future?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions