-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Labels
enhancementImprovement for existing featureImprovement for existing featurefeatureNew featureNew feature
Milestone
Description
mcore/te does not support jagged attention with Arbitrary Masked (It only supports causal). But in our scenario, this is a desirable feature. In our sid_gr example, we now use padded dense with arbitrary mask which is memory and compute intensive.
Describe the solution you'd like
We can either wait for FA3 with arbitrary mask or write our only implementation. And then adapt it into sid example.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementImprovement for existing featureImprovement for existing featurefeatureNew featureNew feature