Skip to content

Commit 926f5b2

Browse files
authored
Enhance Configurability (#55)
* upgrade configurability Signed-off-by: Maroon Ayoub <[email protected]> * doc enhancements Signed-off-by: Maroon Ayoub <[email protected]> * update main README Signed-off-by: Maroon Ayoub <[email protected]> --------- Signed-off-by: Maroon Ayoub <[email protected]>
1 parent 0ac45c6 commit 926f5b2

File tree

12 files changed

+269
-30
lines changed

12 files changed

+269
-30
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ graph TD
5959
5. **Index Update**: The **Event Subscriber** consumes these events and updates the **KV-Block Index** in near-real-time
6060

6161
* For a more detailed breakdown, please see the high-level [Architecture Document](docs/architecture.md).
62+
* For configuration details, see the [Configuration Document](docs/configuration.md).
6263

6364
-----
6465

docs/configuration.md

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
# Configuration Documentation
2+
3+
This document describes all configuration options available in the llm-d KV Cache Manager.
4+
All configurations are JSON-serializable and can be provided via configuration files or environment variables.
5+
6+
## Main Configuration
7+
8+
This package consists of two components:
9+
1. **KV Cache Indexer**: Manages the KV cache index, allowing efficient retrieval of cached blocks.
10+
2. **KV Event Processing**: Handles events from vLLM to update the cache index.
11+
12+
The two components are configured separately, but share the index backend for storing KV block localities.
13+
The latter is configured via the `kvBlockIndexConfig` field in the KV Cache Indexer configuration.
14+
15+
### Indexer Configuration (`Config`)
16+
17+
The main configuration structure for the KV Cache Indexer module.
18+
19+
```json
20+
{
21+
"prefixStoreConfig": { ... },
22+
"tokenProcessorConfig": { ... },
23+
"kvBlockIndexConfig": { ... },
24+
"tokenizersPoolConfig": { ... }
25+
}
26+
```
27+
28+
| Field | Type | Description | Default |
29+
|-------|------|-------------|---------|
30+
| `prefixStoreConfig` | [LRUStoreConfig](#lru-store-configuration-lrustoreconfig) | Configuration for the prefix store | See defaults |
31+
| `tokenProcessorConfig` | [TokenProcessorConfig](#token-processor-configuration-tokenprocessorconfig) | Configuration for token processing | See defaults |
32+
| `kvBlockIndexConfig` | [IndexConfig](#index-configuration-indexconfig) | Configuration for KV block indexing | See defaults |
33+
| `tokenizersPoolConfig` | [Config](#tokenization-pool-configuration-config) | Configuration for tokenization pool | See defaults |
34+
35+
36+
## Complete Example Configuration
37+
38+
Here's a complete configuration example with all options:
39+
40+
```json
41+
{
42+
"prefixStoreConfig": {
43+
"cacheSize": 500000,
44+
"blockSize": 256
45+
},
46+
"tokenProcessorConfig": {
47+
"blockSize": 16,
48+
"hashSeed": "12345"
49+
},
50+
"kvBlockIndexConfig": {
51+
"inMemoryConfig": {
52+
"size": 100000000,
53+
"podCacheSize": 10
54+
},
55+
"enableMetrics": true
56+
},
57+
"tokenizersPoolConfig": {
58+
"workersCount": 8,
59+
"huggingFaceToken": "your_hf_token_here",
60+
"tokenizersCacheDir": "/tmp/tokenizers"
61+
}
62+
}
63+
```
64+
65+
## KV-Block Index Configuration
66+
67+
### Index Configuration (`IndexConfig`)
68+
69+
Configures the KV-block index backend. Multiple backends can be configured, but only the first available one will be used.
70+
71+
```json
72+
{
73+
"inMemoryConfig": { ... },
74+
"redisConfig": { ... },
75+
"enableMetrics": false
76+
}
77+
```
78+
79+
| Field | Type | Description | Default |
80+
|-------|-------------------------------------------------------|-------------|---------|
81+
| `inMemoryConfig` | [InMemoryIndexConfig](#in-memory-index-configuration) | In-memory index configuration | See defaults |
82+
| `redisConfig` | [RedisIndexConfig](#redis-index-configuration) | Redis index configuration | `null` |
83+
| `enableMetrics` | `boolean` | Enable admissions/evictions/hits/misses recording | `false` |
84+
85+
### In-Memory Index Configuration (`InMemoryIndexConfig`)
86+
87+
Configures the in-memory KV block index implementation.
88+
89+
```json
90+
{
91+
"size": 100000000,
92+
"podCacheSize": 10
93+
}
94+
```
95+
96+
| Field | Type | Description | Default |
97+
|-------|------|-------------|---------|
98+
| `size` | `integer` | Maximum number of keys that can be stored | `100000000` |
99+
| `podCacheSize` | `integer` | Maximum number of pod entries per key | `10` |
100+
101+
### Redis Index Configuration (`RedisIndexConfig`)
102+
103+
Configures the Redis-backed KV block index implementation.
104+
105+
```json
106+
{
107+
"address": "redis://127.0.0.1:6379"
108+
}
109+
```
110+
111+
| Field | Type | Description | Default |
112+
|-------|------|-------------|---------|
113+
| `address` | `string` | Redis server address (can include auth: `redis://user:pass@host:port/db`) | `"redis://127.0.0.1:6379"` |
114+
115+
## Token Processing Configuration
116+
117+
### Token Processor Configuration (`TokenProcessorConfig`)
118+
119+
Configures how tokens are converted to KV block keys.
120+
121+
```json
122+
{
123+
"blockSize": 16,
124+
"hashSeed": ""
125+
}
126+
```
127+
128+
| Field | Type | Description | Default |
129+
|-------|------|-------------|---------|
130+
| `blockSize` | `integer` | Number of tokens per block | `16` |
131+
| `hashSeed` | `string` | Seed for hash generation (should align with vLLM's PYTHONHASHSEED) | `""` |
132+
133+
## Prefix Store Configuration
134+
135+
### LRU Store Configuration (`LRUStoreConfig`)
136+
137+
Configures the LRU-based prefix token store.
138+
139+
```json
140+
{
141+
"cacheSize": 500000,
142+
"blockSize": 256
143+
}
144+
```
145+
146+
| Field | Type | Description | Default |
147+
|-------|------|-------------|---------|
148+
| `cacheSize` | `integer` | Maximum number of blocks the LRU cache can store | `500000` |
149+
| `blockSize` | `integer` | Number of tokens per block in the prefix cache | `256` |
150+
151+
## Tokenization Configuration
152+
153+
### Tokenization Pool Configuration (`Config`)
154+
155+
Configures the tokenization worker pool.
156+
157+
```json
158+
{
159+
"workersCount": 5,
160+
"huggingFaceToken": "",
161+
"tokenizersCacheDir": ""
162+
}
163+
```
164+
165+
| Field | Type | Description | Default |
166+
|-------|------|-------------|--------|
167+
| `workersCount` | `integer` | Number of tokenization workers | `5` |
168+
| `huggingFaceToken` | `string` | HuggingFace authentication token | `""` |
169+
| `tokenizersCacheDir` | `string` | Directory for caching tokenizers | `""` |
170+
171+
### HuggingFace Tokenizer Configuration (`HFTokenizerConfig`)
172+
173+
Configures the HuggingFace tokenizer backend.
174+
175+
```json
176+
{
177+
"huggingFaceToken": "",
178+
"tokenizersCacheDir": ""
179+
}
180+
```
181+
182+
| Field | Type | Description | Default |
183+
|-------|------|-------------|---------|
184+
| `huggingFaceToken` | `string` | HuggingFace API token for accessing models | `""` |
185+
| `tokenizersCacheDir` | `string` | Local directory for caching downloaded tokenizers | `"./bin"` |
186+
187+
## KV-Event Processing Configuration
188+
189+
### KV-Event Pool Configuration (`Config`)
190+
191+
Configures the ZMQ event processing pool for handling KV cache events.
192+
193+
```json
194+
{
195+
"zmqEndpoint": "tcp://*:5557",
196+
"topicFilter": "kv@",
197+
"concurrency": 4
198+
}
199+
```
200+
201+
## Event Processing Configuration Example
202+
203+
For the ZMQ event processing pool:
204+
205+
```json
206+
{
207+
"zmqEndpoint": "tcp://indexer:5557",
208+
"topicFilter": "kv@",
209+
"concurrency": 8
210+
}
211+
```
212+
213+
| Field | Type | Description | Default |
214+
|-------|------|-------------|---------|
215+
| `zmqEndpoint` | `string` | ZMQ address to connect to | `"tcp://*:5557"` |
216+
| `topicFilter` | `string` | ZMQ subscription filter | `"kv@"` |
217+
| `concurrency` | `integer` | Number of parallel workers | `4` |
218+
219+
---
220+
## Notes
221+
222+
1. **Hash Seed Alignment**: The `hash_seed` in `TokenProcessorConfig` should be aligned with vLLM's `PYTHONHASHSEED` environment variable to ensure consistent hashing across the system.
223+
224+
2. **Memory Considerations**: The `size` parameter in `InMemoryIndexConfig` directly affects memory usage. Each key-value pair consumes memory proportional to the number of associated pods.
225+
226+
3. **Performance Tuning**:
227+
- Increase `workers_count` in tokenization config for higher tokenization throughput
228+
- Adjust `concurrency` in event processing for better event handling performance
229+
- Tune cache sizes based on available memory and expected workload
230+
231+
4. **Cache Directories**: Ensure the `tokenizers_cache_dir` has sufficient disk space and appropriate permissions for the application to read/write tokenizer files.
232+
233+
5. **Redis Configuration**: When using Redis backend, ensure Redis server is accessible and has sufficient memory. The `address` field supports full Redis URLs including authentication: `redis://user:pass@host:port/db`.

examples/kv_cache_index/main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ func getKVCacheIndexerConfig() (*kvcache.Config, error) {
5555
return nil, fmt.Errorf("failed to parse redis host: %w", err)
5656
}
5757

58-
config.KVBlockIndexConfig.RedisConfig.RedisOpt = redisOpt
58+
config.KVBlockIndexConfig.RedisConfig.Address = redisOpt.Addr
5959
} // Otherwise defaults to in-memory indexer
6060

6161
return config, nil

pkg/kvcache/indexer.go

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,11 @@ import (
3333
// The configuration cover the different components found in the Indexer
3434
// module.
3535
type Config struct {
36-
PrefixStoreConfig *prefixstore.Config
37-
TokenProcessorConfig *kvblock.TokenProcessorConfig
38-
KVBlockIndexConfig *kvblock.IndexConfig
39-
KVBLockScorerConfig *KVBlockScorerConfig
40-
TokenizersPoolConfig *tokenization.Config
36+
PrefixStoreConfig *prefixstore.Config `json:"prefixStoreConfig"`
37+
TokenProcessorConfig *kvblock.TokenProcessorConfig `json:"tokenProcessorConfig"`
38+
KVBlockIndexConfig *kvblock.IndexConfig `json:"kvBlockIndexConfig"`
39+
KVBLockScorerConfig *KVBlockScorerConfig // not exported
40+
TokenizersPoolConfig *tokenization.Config `json:"tokenizersPoolConfig"`
4141
}
4242

4343
// NewDefaultConfig returns a default configuration for the Indexer module.

pkg/kvcache/kvblock/in_memory.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ const (
3636
// InMemoryIndexConfig holds the configuration for the InMemoryIndex.
3737
type InMemoryIndexConfig struct {
3838
// Size is the maximum number of keys that can be stored in the index.
39-
Size int
39+
Size int `json:"size"`
4040
// PodCacheSize is the maximum number of pod entries per key.
41-
PodCacheSize int
41+
PodCacheSize int `json:"podCacheSize"`
4242
}
4343

4444
// DefaultInMemoryIndexConfig returns a default configuration for the InMemoryIndex.

pkg/kvcache/kvblock/index.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,12 @@ import (
2828
// If multiple backends are configured, only the first one will be used.
2929
type IndexConfig struct {
3030
// InMemoryConfig holds the configuration for the in-memory index.
31-
InMemoryConfig *InMemoryIndexConfig
31+
InMemoryConfig *InMemoryIndexConfig `json:"inMemoryConfig"`
3232
// RedisConfig holds the configuration for the Redis index.
33-
RedisConfig *RedisIndexConfig
33+
RedisConfig *RedisIndexConfig `json:"redisConfig"`
3434
// EnableMetrics toggles whether admissions/evictions/hits/misses are
3535
// recorded.
36-
EnableMetrics bool
36+
EnableMetrics bool `json:"enableMetrics"`
3737
}
3838

3939
// DefaultIndexConfig returns a default configuration for the KV-block index.

pkg/kvcache/kvblock/redis.go

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,12 @@ import (
3030

3131
// RedisIndexConfig holds the configuration for the RedisIndex.
3232
type RedisIndexConfig struct {
33-
RedisOpt *redis.Options
33+
Address string `json:"address,omitempty"` // Redis server address
3434
}
3535

3636
func DefaultRedisIndexConfig() *RedisIndexConfig {
3737
return &RedisIndexConfig{
38-
RedisOpt: &redis.Options{
39-
Addr: "localhost:6379",
40-
DB: 0,
41-
},
38+
Address: "redis://127.0.0.1:6379",
4239
}
4340
}
4441

@@ -48,11 +45,20 @@ func NewRedisIndex(config *RedisIndexConfig) (Index, error) {
4845
config = DefaultRedisIndexConfig()
4946
}
5047

51-
redisClient := redis.NewClient(config.RedisOpt)
48+
if !strings.HasPrefix(config.Address, "redis://") &&
49+
!strings.HasPrefix(config.Address, "rediss://") &&
50+
!strings.HasPrefix(config.Address, "unix://") {
51+
config.Address = "redis://" + config.Address
52+
}
5253

53-
_, err := redisClient.Ping(context.Background()).Result()
54+
redisOpt, err := redis.ParseURL(config.Address)
5455
if err != nil {
55-
return nil, fmt.Errorf("could not connect to Redis: %w", err)
56+
return nil, fmt.Errorf("failed to parse redisURL: %w", err)
57+
}
58+
59+
redisClient := redis.NewClient(redisOpt)
60+
if err := redisClient.Ping(context.Background()).Err(); err != nil {
61+
return nil, fmt.Errorf("failed to connect to Redis: %w", err)
5662
}
5763

5864
return &RedisIndex{

pkg/kvcache/kvblock/token_processor.go

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,12 @@ const defaultBlockSize = 16
3333

3434
// TokenProcessorConfig holds the configuration for the token processor.
3535
type TokenProcessorConfig struct {
36-
BlockSize int
36+
BlockSize int `json:"blockSize"`
3737
// HashSeed is used to prefix initial hash chunks, similarly to vLLM's NONE_HASH.
3838
// This should be aligned with vLLM's `PYTHONHASHSEED` environment variable.
3939
// The system's deployer is responsible for aligning the vLLM deployments
4040
// with the same seed value.
41-
HashSeed string
42-
41+
HashSeed string `json:"hashSeed"`
4342
initHash *uint64 // cache once
4443
}
4544

pkg/kvcache/kvevents/pool.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ import (
1717
// Config holds the configuration for the event processing pool.
1818
type Config struct {
1919
// ZMQEndpoint is the ZMQ address to connect to (e.g., "tcp://indexer:5557").
20-
ZMQEndpoint string
20+
ZMQEndpoint string `json:"zmqEndpoint"`
2121
// TopicFilter is the ZMQ subscription filter (e.g., "kv.").
22-
TopicFilter string
22+
TopicFilter string `json:"topicFilter"`
2323
// Concurrency is the number of parallel workers to run.
24-
Concurrency int
24+
Concurrency int `json:"concurrency"`
2525
}
2626

2727
// DefaultConfig returns a default configuration for the event processing pool.

pkg/tokenization/pool.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ const defaultWorkers = 5
3131

3232
// Config holds the configuration for the TokenizationPool.
3333
type Config struct {
34-
WorkersCount int
34+
WorkersCount int `json:"workersCount"`
3535
*HFTokenizerConfig
3636
}
3737

0 commit comments

Comments
 (0)