You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Add a setup documentation about examples/kv-cache-index (#38)
* add docs about kv-cache-index setup, and allow the example code base to parse redis related envionment variables
Signed-off-by: Burak Sekili <[email protected]>
move docs to deployment subfolder for brevity
Signed-off-by: Burak Sekili <[email protected]>
use lower-case version of the vllm model label in the vllm deploment metadata, to prevent Kubernetes issues with models that contain upper-letters in their names
Signed-off-by: Burak Sekili <[email protected]>
implement suggested changes according to the review
Signed-off-by: Burak Sekili <[email protected]>
update release name for vllm helm deployment, to make it align with the purpose of the deployment
Signed-off-by: Burak Sekili <[email protected]>
exit in case of errors in example
Signed-off-by: Burak Sekili <[email protected]>
fix linter
Signed-off-by: Burak Sekili <[email protected]>
Merge
Signed-off-by: Burak Sekili <[email protected]>
add docs about kv-cache-index setup, and allow the example code base to parse redis related envionment variables
Signed-off-by: Burak Sekili <[email protected]>
Apply suggestions from code review
Co-authored-by: Maroon Ayoub <[email protected]>
Signed-off-by: Burak Sekili <[email protected]>
fix duplicate package
Signed-off-by: Burak Sekili <[email protected]>
* remove old file
Signed-off-by: Burak Sekili <[email protected]>
---------
Signed-off-by: Burak Sekili <[email protected]>
This guide provides a complete walkthrough for setting up and testing the example llm-d-kv-cache-manager system. You will deploy a vLLM with LMCache and Redis, then run an example application that demonstrates KV cache indexing capabilities.
4
+
5
+
By following this guide, you will:
6
+
7
+
1.**Deploy the Infrastructure**: Use Helm to set up:
8
+
- vLLM nodes with LMCache CPU offloading (4 replicas) serving Llama 3.1 8B Instruct model
9
+
- Redis server
10
+
2.**Test with Example Application**: Run a Go application that:
11
+
- Connects to your deployed vLLM and Redis infrastructure,
12
+
- Demonstrates KV cache indexing by processing a sample prompt
13
+
14
+
The demonstrated KV-cache indexer is utilized for AI-aware routing to accelerate inference across the system through minimizing redundant computation.
15
+
16
+
## vLLM Deployment
17
+
18
+
The llm-d-kv-cache-manager repository includes a Helm chart for deploying vLLM with CPU offloading (LMCache) and KV-events indexing (Redis). This section describes how to use this Helm chart for a complete deployment.
19
+
20
+
*Note*: Ensure that the Kubernetes node designated for running vLLM supports GPU workloads.
Ensure you have a running deployment with vLLM and Redis as described above.
91
+
92
+
### Running the Example
93
+
94
+
The vLLM node can be tested with the prompt found in `examples/kv-cache-index/main.go`.
95
+
96
+
First, download the tokenizer bindings required by the `kvcache.Indexer` for prompt tokenization:
97
+
98
+
```bash
99
+
make download-tokenizer
100
+
```
101
+
102
+
Then, set the required environment variables and run example:
103
+
104
+
```bash
105
+
export HF_TOKEN=<token>
106
+
export REDIS_ADDR=<redis://$user:$pass@localhost:6379/$db># optional, defaults to localhost:6379
107
+
export MODEL_NAME=<model_name_used_in_vllm_deployment># optional, defaults to meta-llama/Llama-3.1-8B-Instruct
108
+
109
+
go run -ldflags="-extldflags '-L$(pwd)/lib'" examples/kv-cache-index/main.go
110
+
```
111
+
112
+
Environment variables:
113
+
114
+
-`HF_TOKEN` (required): HuggingFace access token
115
+
-`REDIS_ADDR` (optional): Redis address; defaults to localhost:6379.
116
+
-`MODEL_NAME` (optional): The model name used in vLLM deployment; defaults to meta-llama/Llama-3.1-8B-Instruct. Use the same value you set during Helm deployment.
The code in main.go showcases how to configure and use a KVCacheIndex module.
3
+
The code in main.go showcases how to configure and use a KVCacheIndex module.
4
+
5
+
For instructions on setting up an example environment for this demonstration, please refer to [docs/deployment/setup.md](../../docs/deployment/setup.md).
//nolint:lll // need prompt as-is, chunking to string concatenation is too much of a hassle
40
-
constprompt=`lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.`
prompt=`lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur pretium tincidunt lacus. Nulla gravida orci a odio. Nullam varius, turpis et commodo pharetra, est eros bibendum elit, nec luctus magna felis sollicitudin mauris. Integer in mauris eu nibh euismod gravida. Duis ac tellus et risus vulputate vehicula. Donec lobortis risus a elit. Etiam tempor. Ut ullamcorper, ligula eu tempor congue, eros est euismod turpis, id tincidunt sapien risus a quam. Maecenas fermentum consequat mi. Donec fermentum. Pellentesque malesuada nulla a mi. Duis sapien sem, aliquet nec, commodo eget, consequat quis, neque. Aliquam faucibus, elit ut dictum aliquet, felis nisl adipiscing sapien, sed malesuada diam lacus eget erat. Cras mollis scelerisque nunc. Nullam arcu. Aliquam consequat. Curabitur augue lorem, dapibus quis, laoreet et, pretium ac, nisi. Aenean magna nisl, mollis quis, molestie eu, feugiat in, orci. In hac habitasse platea dictumst.`
0 commit comments