-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
Hey all. I am convinced by the paper. I am about to use it for my application and see how it does, but am wondering in your read why this is not more widespread in use compared to normal concatenated MHA? Seems like any big LLM company should be using this for param efficiency. Are they and just not saying so?
QishuaiWen
Metadata
Metadata
Assignees
Labels
No labels