-
Notifications
You must be signed in to change notification settings - Fork 423
Description
I am working on adding support for lite-whisper: #1886, efeslab/LiteASR#7
However, the existing fused layer logic does not work for the low rank qkv matrices used ln lite-whisper: paper for context. This is because a low-rank linear layer for any one of the qkv projection matrices has the potential to fall back to the full layer if the compression algorithm can't compress the layer without sacrificing accuracy. This means that in lite-whisper, the qkv layers in each encoder are a mix of Linear and LowRankLinear layers, preventing them from being fused since the two layer types are executed differently.
Example encoder layer with mix of Linear and LowRankLinear layers:

Would it cause problems if this fused layer logic was removed? Would it be better if I removed this logic only when lite-whisper runs? Is there another work around that allows me to keep the fused layer concatenation logic even though the layers are different?