-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededtensorflowRelated to TensorflowRelated to TensorflowtestsRelated to testsRelated to tests
Description
Is your feature request related to a problem? Please describe.
caching the key and value matrices during the self-attention mechanism to reduce computational complexity and improve inference speed
Describe the solution you'd like
This caching mechanism reduces the need to recompute the full key and value matrices in every iteration of the decoding process, leading to faster inference.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is neededtensorflowRelated to TensorflowRelated to TensorflowtestsRelated to testsRelated to tests