Development issues with `fused_layers`

I am working on adding support for lite-whisper: https://github.com/OpenNMT/CTranslate2/pull/1886, https://github.com/efeslab/LiteASR/issues/7

However, the existing fused layer logic does not work for the low rank qkv matrices used ln lite-whisper: [paper](https://arxiv.org/abs/2502.20583) for context. This is because a low-rank linear layer for any one of the qkv projection matrices has the potential to fall back to the full layer if the compression algorithm can't compress the layer without sacrificing accuracy. This means that in lite-whisper, the qkv layers in each encoder are a mix of Linear and LowRankLinear layers, preventing them from being fused since the two layer types are executed differently.

Example encoder layer with mix of Linear and LowRankLinear layers:
<img width="925" alt="Image" src="https://github.com/user-attachments/assets/65d810e9-e6d6-474e-a943-6afc81a91e52" />

Would it cause problems if this fused layer logic was removed? Would it be better if I removed this logic only when lite-whisper runs? Is there another work around that allows me to keep the fused layer concatenation logic even though the layers are different?

@minhthuc2502 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Development issues with `fused_layers` #1887

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Development issues with fused_layers #1887

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Development issues with `fused_layers` #1887