[Enhancement] KV Caching for inference speed

**Is your feature request related to a problem? Please describe.**

caching the key and value matrices during the self-attention mechanism to reduce computational complexity and improve inference speed

**Describe the solution you'd like**

This caching mechanism reduces the need to recompute the full key and value matrices in every iteration of the decoding process, leading to faster inference.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] KV Caching for inference speed #110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement] KV Caching for inference speed #110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions