Skip to content

Commit 2d3b68d

Browse files
authored
docs: Add a setup documentation about examples/kv-cache-index (#38)
* add docs about kv-cache-index setup, and allow the example code base to parse redis related envionment variables Signed-off-by: Burak Sekili <[email protected]> move docs to deployment subfolder for brevity Signed-off-by: Burak Sekili <[email protected]> use lower-case version of the vllm model label in the vllm deploment metadata, to prevent Kubernetes issues with models that contain upper-letters in their names Signed-off-by: Burak Sekili <[email protected]> implement suggested changes according to the review Signed-off-by: Burak Sekili <[email protected]> update release name for vllm helm deployment, to make it align with the purpose of the deployment Signed-off-by: Burak Sekili <[email protected]> exit in case of errors in example Signed-off-by: Burak Sekili <[email protected]> fix linter Signed-off-by: Burak Sekili <[email protected]> Merge Signed-off-by: Burak Sekili <[email protected]> add docs about kv-cache-index setup, and allow the example code base to parse redis related envionment variables Signed-off-by: Burak Sekili <[email protected]> Apply suggestions from code review Co-authored-by: Maroon Ayoub <[email protected]> Signed-off-by: Burak Sekili <[email protected]> fix duplicate package Signed-off-by: Burak Sekili <[email protected]> * remove old file Signed-off-by: Burak Sekili <[email protected]> --------- Signed-off-by: Burak Sekili <[email protected]>
1 parent 3599c3e commit 2d3b68d

File tree

5 files changed

+179
-22
lines changed

5 files changed

+179
-22
lines changed

docs/deployment/setup.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# KV-Cache Manager Setup Guide
2+
3+
This guide provides a complete walkthrough for setting up and testing the example llm-d-kv-cache-manager system. You will deploy a vLLM with LMCache and Redis, then run an example application that demonstrates KV cache indexing capabilities.
4+
5+
By following this guide, you will:
6+
7+
1. **Deploy the Infrastructure**: Use Helm to set up:
8+
- vLLM nodes with LMCache CPU offloading (4 replicas) serving Llama 3.1 8B Instruct model
9+
- Redis server
10+
2. **Test with Example Application**: Run a Go application that:
11+
- Connects to your deployed vLLM and Redis infrastructure,
12+
- Demonstrates KV cache indexing by processing a sample prompt
13+
14+
The demonstrated KV-cache indexer is utilized for AI-aware routing to accelerate inference across the system through minimizing redundant computation.
15+
16+
## vLLM Deployment
17+
18+
The llm-d-kv-cache-manager repository includes a Helm chart for deploying vLLM with CPU offloading (LMCache) and KV-events indexing (Redis). This section describes how to use this Helm chart for a complete deployment.
19+
20+
*Note*: Ensure that the Kubernetes node designated for running vLLM supports GPU workloads.
21+
22+
### Prerequisites
23+
24+
- Kubernetes cluster with GPU support
25+
- Helm 3.x
26+
- HuggingFace token for accessing models
27+
- kubectl configured to access your cluster
28+
29+
### Installation
30+
31+
1. Set environment variables:
32+
33+
```bash
34+
export HF_TOKEN=<your-huggingface-token>
35+
export NAMESPACE=<your-namespace>
36+
export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
37+
export VLLM_POOLLABEL="vllm-model-pool"
38+
```
39+
40+
> Note that both the Helm deployment and the example application use the same `MODEL_NAME` environment variable,
41+
> ensuring alignment between the vLLM deployment configuration and the KV cache indexer.
42+
> Set this variable once during initial setup and both components will use the same model configuration.
43+
44+
2. Deploy using Helm:
45+
46+
```bash
47+
helm upgrade --install vllm-stack ./vllm-setup-helm \
48+
--namespace $NAMESPACE \
49+
--create-namespace \
50+
--set secret.create=true \
51+
--set secret.hfTokenValue=$HF_TOKEN \
52+
--set vllm.model.name=$MODEL_NAME \
53+
--set vllm.poolLabelValue=$VLLM_POOLLABEL \
54+
-f ./vllm-setup-helm/values.yaml
55+
```
56+
57+
**Note:**
58+
59+
- Adjust the resource and limit allocations for vLLM and Redis in `values.yaml` to match your cluster's capacity.
60+
- By default, the chart uses a `PersistentVolume` to cache the model. To disable this, set `.persistence.enabled` to `false`.
61+
62+
3. Verify the deployment:
63+
64+
```bash
65+
kubectl get deployments -n $NAMESPACE
66+
```
67+
68+
You should see:
69+
70+
- vLLM pods (default: 4 replicas)
71+
- Redis lookup server pod
72+
73+
### Configuration Options
74+
75+
The Helm chart supports various configuration options. See [values.yaml](../../vllm-setup-helm/values.yaml) for all available options.
76+
77+
Key configuration parameters:
78+
79+
- `vllm.model.name`: The HuggingFace model to use (default: `meta-llama/Llama-3.1-8B-Instruct`)
80+
- `vllm.replicaCount`: Number of vLLM replicas (default: 4)
81+
- `vllm.poolLabelValue`: Label value for the inference pool (used by scheduler)
82+
- `redis.enabled`: Whether to deploy Redis for KV cache indexing (default: true)
83+
- `persistence.enabled`: Enable persistent storage for model cache (default: true)
84+
- `secret.create`: Create HuggingFace token secret (default: true)
85+
86+
## Using the KV Cache Indexer Example
87+
88+
### Prerequisites
89+
90+
Ensure you have a running deployment with vLLM and Redis as described above.
91+
92+
### Running the Example
93+
94+
The vLLM node can be tested with the prompt found in `examples/kv-cache-index/main.go`.
95+
96+
First, download the tokenizer bindings required by the `kvcache.Indexer` for prompt tokenization:
97+
98+
```bash
99+
make download-tokenizer
100+
```
101+
102+
Then, set the required environment variables and run example:
103+
104+
```bash
105+
export HF_TOKEN=<token>
106+
export REDIS_ADDR=<redis://$user:$pass@localhost:6379/$db> # optional, defaults to localhost:6379
107+
export MODEL_NAME=<model_name_used_in_vllm_deployment> # optional, defaults to meta-llama/Llama-3.1-8B-Instruct
108+
109+
go run -ldflags="-extldflags '-L$(pwd)/lib'" examples/kv-cache-index/main.go
110+
```
111+
112+
Environment variables:
113+
114+
- `HF_TOKEN` (required): HuggingFace access token
115+
- `REDIS_ADDR` (optional): Redis address; defaults to localhost:6379.
116+
- `MODEL_NAME` (optional): The model name used in vLLM deployment; defaults to meta-llama/Llama-3.1-8B-Instruct. Use the same value you set during Helm deployment.

examples/kv-cache-index/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
# KVCacheIndex Use Example
22

3-
The code in main.go showcases how to configure and use a KVCacheIndex module.
3+
The code in main.go showcases how to configure and use a KVCacheIndex module.
4+
5+
For instructions on setting up an example environment for this demonstration, please refer to [docs/deployment/setup.md](../../docs/deployment/setup.md).

examples/kv-cache-index/main.go

Lines changed: 58 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,19 @@ package main
1818

1919
import (
2020
"context"
21+
"fmt"
2122
"os"
2223
"time"
2324

25+
"github.com/redis/go-redis/v9"
26+
2427
"k8s.io/klog/v2"
2528

2629
"github.com/llm-d/llm-d-kv-cache-manager/pkg/kvcache"
2730
)
2831

2932
/*
30-
Refer to docs/phase1-setup.md
33+
Refer to docs/deployment/setup.md
3134
3235
In Redis:
3336
1) "meta-llama/Llama-3.1-8B-Instruct@33c26f4ed679005e733e382beeb8df69d8362c07400bb07fec69712413cb4310"
@@ -37,58 +40,93 @@ In Redis:
3740
*/
3841

3942
//nolint:lll // need prompt as-is, chunking to string concatenation is too much of a hassle
40-
const prompt = `lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.`
41-
const modelName = "ibm-granite/granite-3.3-8b-instruct"
43+
const (
44+
prompt = `lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.`
45+
defaultModelName = "meta-llama/Llama-3.1-8B-Instruct"
46+
47+
envRedisAddr = "REDIS_ADDR"
48+
envHFToken = "HF_TOKEN"
49+
envModelName = "MODEL_NAME"
50+
)
4251

43-
func getKVCacheIndexerConfig() *kvcache.Config {
52+
func getKVCacheIndexerConfig() (*kvcache.Config, error) {
4453
config := kvcache.NewDefaultConfig()
4554

4655
// For sample running with mistral (tokenizer), a huggingface token is needed
47-
huggingFaceToken := os.Getenv("HF_TOKEN")
56+
huggingFaceToken := os.Getenv(envHFToken)
4857
if huggingFaceToken != "" {
4958
config.TokenizersPoolConfig.HuggingFaceToken = huggingFaceToken
5059
}
5160

52-
return config
61+
redisAddr := os.Getenv(envRedisAddr)
62+
if redisAddr != "" {
63+
redisOpt, err := redis.ParseURL(redisAddr)
64+
if err != nil {
65+
return nil, fmt.Errorf("failed to parse redis host: %w", err)
66+
}
67+
68+
config.KVBlockIndexConfig.RedisConfig.RedisOpt = redisOpt
69+
}
70+
71+
return config, nil
72+
}
73+
74+
func getModelName() string {
75+
modelName := os.Getenv(envModelName)
76+
if modelName != "" {
77+
return modelName
78+
}
79+
80+
return defaultModelName
5381
}
5482

5583
func main() {
56-
ctx, cancel := context.WithCancel(context.Background())
84+
ctx := context.Background()
5785
logger := klog.FromContext(ctx)
5886

59-
kvCacheIndexer, err := kvcache.NewKVCacheIndexer(getKVCacheIndexerConfig())
87+
if err := kvCacheIndexer(ctx, logger); err != nil {
88+
logger.Error(err, "failed to run kv-cache-indexer")
89+
os.Exit(1)
90+
}
91+
}
92+
93+
func kvCacheIndexer(ctx context.Context, logger klog.Logger) error {
94+
config, err := getKVCacheIndexerConfig()
6095
if err != nil {
61-
logger.Error(err, "failed to init Indexer")
96+
return err
6297
}
6398

64-
logger.Info("created Indexer")
99+
//nolint:contextcheck // NewKVCacheIndexer does not accept context parameter
100+
kvCacheIndexer, err := kvcache.NewKVCacheIndexer(config)
101+
if err != nil {
102+
return err
103+
}
104+
105+
logger.Info("Created Indexer")
65106

66107
go kvCacheIndexer.Run(ctx)
67-
logger.Info("started Indexer")
108+
modelName := getModelName()
109+
logger.Info("Started Indexer", "model", modelName)
68110

69111
// Get pods for the prompt
70112
pods, err := kvCacheIndexer.GetPodScores(ctx, prompt, modelName, nil)
71113
if err != nil {
72-
logger.Error(err, "failed to get pod scores")
73-
return
114+
return err
74115
}
75116

76117
// Print the pods - should be empty because no tokenization
77-
logger.Info("got pods", "pods", pods)
118+
logger.Info("Got pods", "pods", pods)
78119

79120
// Sleep 3 secs
80121
time.Sleep(3 * time.Second)
81122

82123
// Get pods for the prompt
83124
pods, err = kvCacheIndexer.GetPodScores(ctx, prompt, modelName, nil)
84125
if err != nil {
85-
logger.Error(err, "failed to get pod scores")
86-
return
126+
return err
87127
}
88128

89129
// Print the pods - should be empty because no tokenization
90-
logger.Info("got pods", "pods", pods)
91-
92-
// Cancel the context
93-
cancel()
130+
logger.Info("Got pods", "pods", pods)
131+
return nil
94132
}

vllm-setup-helm/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Deploying (repo root as working directory):
1616
```
1717
helm upgrade --install vllm-p2p ./vllm-setup-helm \
1818
--namespace $NAMESPACE \
19+
--create-namespace \
1920
--set secret.create=true \
2021
--set secret.hfTokenValue=$HF_TOKEN \
2122
--set vllm.poolLabelValue="vllm-llama3-8b-instruct"

vllm-setup-helm/templates/deployment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: apps/v1
22
kind: Deployment
33
metadata:
4-
name: {{ .Release.Name }}-vllm-{{ .Values.vllm.model.label }}
4+
name: {{ .Release.Name }}-vllm-{{ lower .Values.vllm.model.label }}
55
namespace: {{ .Release.Namespace | default .Values.namespace }}
66
labels:
77
{{- include "chart.labels" . | nindent 4 }}

0 commit comments

Comments
 (0)