Skip to content

[FEA] Jagged Arbitrary Masked Self Attention support #276

@JacoCheung

Description

@JacoCheung

mcore/te does not support jagged attention with Arbitrary Masked (It only supports causal). But in our scenario, this is a desirable feature. In our sid_gr example, we now use padded dense with arbitrary mask which is memory and compute intensive.

Describe the solution you'd like

We can either wait for FA3 with arbitrary mask or write our only implementation. And then adapt it into sid example.

Metadata

Metadata

Labels

enhancementImprovement for existing featurefeatureNew feature

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions