Skip to content

[Enhancement] KV Caching for inference speedΒ #110

@soran-ghaderi

Description

@soran-ghaderi

Is your feature request related to a problem? Please describe.

caching the key and value matrices during the self-attention mechanism to reduce computational complexity and improve inference speed

Describe the solution you'd like

This caching mechanism reduces the need to recompute the full key and value matrices in every iteration of the decoding process, leading to faster inference.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is neededtensorflowRelated to TensorflowtestsRelated to tests

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions