-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Description
Although the comments in the moba_attn_varlen() function of moba_efficient.py mention that Triton kernels are used in the implementation of MoBA (line 280), I did not find any Triton kernels among the code. Could the authors plz clarify whether there is an implementation of MoBA optimized with Triton kernels?
BTW, as far as I know, Kimi K2 and other open- or closed-source models have not adopted MoBA to improve long-context performance or model efficiency. Is there any particular reason why sparse attention methods like MoBA have not been incorporated into base models?
Metadata
Metadata
Assignees
Labels
No labels