-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
In the current implementation, on every external request, the indexer performs the following:
// 1. get available tokens of longest prefix
tokens := k.tokensIndexer.FindLongestContainedTokens(prompt, modelName)
...
// 2. get block keys
blockKeys := k.tokensProcessor.TokensToKVBlockKeys(tokens, modelName)
...
// 3. query kvblock indexer for pods
strBlockKeys, keyToPods, err := k.kvBlockIndexer.GetPodsForKeys(ctx, blockKeys, sets.New(podIdentifiers...))
...
It is possible to cache (2) into (1) directly and avoid these calculations - if it can be self-contained.
[Should be profiling driven]
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed