Enhance Configurability (#55)

vMaroon · web-flow · commit 926f5b2e1bf7 · 2025-07-18T00:04:06.000+03:00
* upgrade configurability

Signed-off-by: Maroon Ayoub &lt;maroon.ayoub@ibm.com&gt;

* doc enhancements

Signed-off-by: Maroon Ayoub &lt;maroon.ayoub@ibm.com&gt;

* update main README

Signed-off-by: Maroon Ayoub &lt;maroon.ayoub@ibm.com&gt;

---------

Signed-off-by: Maroon Ayoub &lt;maroon.ayoub@ibm.com&gt;
diff --git a/README.md b/README.md
@@ -59,6 +59,7 @@ graph TD
 5.  **Index Update**: The **Event Subscriber** consumes these events and updates the **KV-Block Index** in near-real-time
 
 * For a more detailed breakdown, please see the high-level [Architecture Document](docs/architecture.md).
+* For configuration details, see the [Configuration Document](docs/configuration.md).
 
 -----
 
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -0,0 +1,233 @@
+# Configuration Documentation
+
+This document describes all configuration options available in the llm-d KV Cache Manager. 
+All configurations are JSON-serializable and can be provided via configuration files or environment variables.
+
+## Main Configuration
+
+This package consists of two components:
+1. **KV Cache Indexer**: Manages the KV cache index, allowing efficient retrieval of cached blocks.
+2. **KV Event Processing**: Handles events from vLLM to update the cache index.
+
+The two components are configured separately, but share the index backend for storing KV block localities.
+The latter is configured via the `kvBlockIndexConfig` field in the KV Cache Indexer configuration.
+
+### Indexer Configuration (`Config`)
+
+The main configuration structure for the KV Cache Indexer module.
+
+```json
+{
+  "prefixStoreConfig": { ... },
+  "tokenProcessorConfig": { ... },
+  "kvBlockIndexConfig": { ... },
+  "tokenizersPoolConfig": { ... }
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|---------|
+| `prefixStoreConfig` | [LRUStoreConfig](#lru-store-configuration-lrustoreconfig) | Configuration for the prefix store | See defaults |
+| `tokenProcessorConfig` | [TokenProcessorConfig](#token-processor-configuration-tokenprocessorconfig) | Configuration for token processing | See defaults |
+| `kvBlockIndexConfig` | [IndexConfig](#index-configuration-indexconfig) | Configuration for KV block indexing | See defaults |
+| `tokenizersPoolConfig` | [Config](#tokenization-pool-configuration-config) | Configuration for tokenization pool | See defaults |
+
+
+## Complete Example Configuration
+
+Here's a complete configuration example with all options:
+
+```json
+{
+  "prefixStoreConfig": {
+    "cacheSize": 500000,
+    "blockSize": 256
+  },
+  "tokenProcessorConfig": {
+    "blockSize": 16,
+    "hashSeed": "12345"
+  },
+  "kvBlockIndexConfig": {
+    "inMemoryConfig": {
+      "size": 100000000,
+      "podCacheSize": 10
+    },
+    "enableMetrics": true
+  },
+  "tokenizersPoolConfig": {
+    "workersCount": 8,
+    "huggingFaceToken": "your_hf_token_here",
+    "tokenizersCacheDir": "/tmp/tokenizers"
+  }
+}
+```
+
+## KV-Block Index Configuration
+
+### Index Configuration (`IndexConfig`)
+
+Configures the KV-block index backend. Multiple backends can be configured, but only the first available one will be used.
+
+```json
+{
+  "inMemoryConfig": { ... },
+  "redisConfig": { ... },
+  "enableMetrics": false
+}
+```
+
+| Field | Type                                                  | Description | Default |
+|-------|-------------------------------------------------------|-------------|---------|
+| `inMemoryConfig` | [InMemoryIndexConfig](#in-memory-index-configuration) | In-memory index configuration | See defaults |
+| `redisConfig` | [RedisIndexConfig](#redis-index-configuration)        | Redis index configuration | `null` |
+| `enableMetrics` | `boolean`                                             | Enable admissions/evictions/hits/misses recording | `false` |
+
+### In-Memory Index Configuration (`InMemoryIndexConfig`)
+
+Configures the in-memory KV block index implementation.
+
+```json
+{
+  "size": 100000000,
+  "podCacheSize": 10
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|---------|
+| `size` | `integer` | Maximum number of keys that can be stored | `100000000` |
+| `podCacheSize` | `integer` | Maximum number of pod entries per key | `10` |
+
+### Redis Index Configuration (`RedisIndexConfig`)
+
+Configures the Redis-backed KV block index implementation.
+
+```json
+{
+  "address": "redis://127.0.0.1:6379"
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|---------|
+| `address` | `string` | Redis server address (can include auth: `redis://user:pass@host:port/db`) | `"redis://127.0.0.1:6379"` |
+
+## Token Processing Configuration
+
+### Token Processor Configuration (`TokenProcessorConfig`)
+
+Configures how tokens are converted to KV block keys.
+
+```json
+{
+  "blockSize": 16,
+  "hashSeed": ""
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|---------|
+| `blockSize` | `integer` | Number of tokens per block | `16` |
+| `hashSeed` | `string` | Seed for hash generation (should align with vLLM's PYTHONHASHSEED) | `""` |
+
+## Prefix Store Configuration
+
+### LRU Store Configuration (`LRUStoreConfig`)
+
+Configures the LRU-based prefix token store.
+
+```json
+{
+  "cacheSize": 500000,
+  "blockSize": 256
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|---------|
+| `cacheSize` | `integer` | Maximum number of blocks the LRU cache can store | `500000` |
+| `blockSize` | `integer` | Number of tokens per block in the prefix cache | `256` |
+
+## Tokenization Configuration
+
+### Tokenization Pool Configuration (`Config`)
+
+Configures the tokenization worker pool.
+
+```json
+{
+  "workersCount": 5,
+  "huggingFaceToken": "",
+  "tokenizersCacheDir": ""
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|--------|
+| `workersCount` | `integer` | Number of tokenization workers | `5` |
+| `huggingFaceToken` | `string` | HuggingFace authentication token | `""` |
+| `tokenizersCacheDir` | `string` | Directory for caching tokenizers | `""` |
+
+### HuggingFace Tokenizer Configuration (`HFTokenizerConfig`)
+
+Configures the HuggingFace tokenizer backend.
+
+```json
+{
+  "huggingFaceToken": "",
+  "tokenizersCacheDir": ""
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|---------|
+| `huggingFaceToken` | `string` | HuggingFace API token for accessing models | `""` |
+| `tokenizersCacheDir` | `string` | Local directory for caching downloaded tokenizers | `"./bin"` |
+
+## KV-Event Processing Configuration
+
+### KV-Event Pool Configuration (`Config`)
+
+Configures the ZMQ event processing pool for handling KV cache events.
+
+```json
+{
+  "zmqEndpoint": "tcp://*:5557",
+  "topicFilter": "kv@",
+  "concurrency": 4
+}
+```
+
+## Event Processing Configuration Example
+
+For the ZMQ event processing pool:
+
+```json
+{
+  "zmqEndpoint": "tcp://indexer:5557",
+  "topicFilter": "kv@",
+  "concurrency": 8
+}
+```
+
+| Field | Type | Description | Default |
+|-------|------|-------------|---------|
+| `zmqEndpoint` | `string` | ZMQ address to connect to | `"tcp://*:5557"` |
+| `topicFilter` | `string` | ZMQ subscription filter | `"kv@"` |
+| `concurrency` | `integer` | Number of parallel workers | `4` |
+
+---
+## Notes
+
+1. **Hash Seed Alignment**: The `hash_seed` in `TokenProcessorConfig` should be aligned with vLLM's `PYTHONHASHSEED` environment variable to ensure consistent hashing across the system.
+
+2. **Memory Considerations**: The `size` parameter in `InMemoryIndexConfig` directly affects memory usage. Each key-value pair consumes memory proportional to the number of associated pods.
+
+3. **Performance Tuning**: 
+   - Increase `workers_count` in tokenization config for higher tokenization throughput
+   - Adjust `concurrency` in event processing for better event handling performance
+   - Tune cache sizes based on available memory and expected workload
+
+4. **Cache Directories**: Ensure the `tokenizers_cache_dir` has sufficient disk space and appropriate permissions for the application to read/write tokenizer files.
+
+5. **Redis Configuration**: When using Redis backend, ensure Redis server is accessible and has sufficient memory. The `address` field supports full Redis URLs including authentication: `redis://user:pass@host:port/db`.
diff --git a/examples/kv_cache_index/main.go b/examples/kv_cache_index/main.go
@@ -55,7 +55,7 @@ func getKVCacheIndexerConfig() (*kvcache.Config, error) {
 			return nil, fmt.Errorf("failed to parse redis host: %w", err)
 		}
 
-		config.KVBlockIndexConfig.RedisConfig.RedisOpt = redisOpt
+		config.KVBlockIndexConfig.RedisConfig.Address = redisOpt.Addr
 	} // Otherwise defaults to in-memory indexer
 
 	return config, nil
diff --git a/pkg/kvcache/indexer.go b/pkg/kvcache/indexer.go
@@ -33,11 +33,11 @@ import (
 // The configuration cover the different components found in the Indexer
 // module.
 type Config struct {
-	PrefixStoreConfig    *prefixstore.Config
-	TokenProcessorConfig *kvblock.TokenProcessorConfig
-	KVBlockIndexConfig   *kvblock.IndexConfig
-	KVBLockScorerConfig  *KVBlockScorerConfig
-	TokenizersPoolConfig *tokenization.Config
+	PrefixStoreConfig    *prefixstore.Config           `json:"prefixStoreConfig"`
+	TokenProcessorConfig *kvblock.TokenProcessorConfig `json:"tokenProcessorConfig"`
+	KVBlockIndexConfig   *kvblock.IndexConfig          `json:"kvBlockIndexConfig"`
+	KVBLockScorerConfig  *KVBlockScorerConfig          // not exported
+	TokenizersPoolConfig *tokenization.Config          `json:"tokenizersPoolConfig"`
 }
 
 // NewDefaultConfig returns a default configuration for the Indexer module.
diff --git a/pkg/kvcache/kvblock/in_memory.go b/pkg/kvcache/kvblock/in_memory.go
@@ -36,9 +36,9 @@ const (
 // InMemoryIndexConfig holds the configuration for the InMemoryIndex.
 type InMemoryIndexConfig struct {
 	// Size is the maximum number of keys that can be stored in the index.
-	Size int
+	Size int `json:"size"`
 	// PodCacheSize is the maximum number of pod entries per key.
-	PodCacheSize int
+	PodCacheSize int `json:"podCacheSize"`
 }
 
 // DefaultInMemoryIndexConfig returns a default configuration for the InMemoryIndex.
diff --git a/pkg/kvcache/kvblock/index.go b/pkg/kvcache/kvblock/index.go
@@ -28,12 +28,12 @@ import (
 // If multiple backends are configured, only the first one will be used.
 type IndexConfig struct {
 	// InMemoryConfig holds the configuration for the in-memory index.
-	InMemoryConfig *InMemoryIndexConfig
+	InMemoryConfig *InMemoryIndexConfig `json:"inMemoryConfig"`
 	// RedisConfig holds the configuration for the Redis index.
-	RedisConfig *RedisIndexConfig
+	RedisConfig *RedisIndexConfig `json:"redisConfig"`
 	// EnableMetrics toggles whether admissions/evictions/hits/misses are
 	// recorded.
-	EnableMetrics bool
+	EnableMetrics bool `json:"enableMetrics"`
 }
 
 // DefaultIndexConfig returns a default configuration for the KV-block index.
diff --git a/pkg/kvcache/kvblock/redis.go b/pkg/kvcache/kvblock/redis.go
@@ -30,15 +30,12 @@ import (
 
 // RedisIndexConfig holds the configuration for the RedisIndex.
 type RedisIndexConfig struct {
-	RedisOpt *redis.Options
+	Address string `json:"address,omitempty"` // Redis server address
 }
 
 func DefaultRedisIndexConfig() *RedisIndexConfig {
 	return &RedisIndexConfig{
-		RedisOpt: &redis.Options{
-			Addr: "localhost:6379",
-			DB:   0,
-		},
+		Address: "redis://127.0.0.1:6379",
 	}
 }
 
@@ -48,11 +45,20 @@ func NewRedisIndex(config *RedisIndexConfig) (Index, error) {
 		config = DefaultRedisIndexConfig()
 	}
 
-	redisClient := redis.NewClient(config.RedisOpt)
+	if !strings.HasPrefix(config.Address, "redis://") &&
+		!strings.HasPrefix(config.Address, "rediss://") &&
+		!strings.HasPrefix(config.Address, "unix://") {
+		config.Address = "redis://" + config.Address
+	}
 
-	_, err := redisClient.Ping(context.Background()).Result()
+	redisOpt, err := redis.ParseURL(config.Address)
 	if err != nil {
-		return nil, fmt.Errorf("could not connect to Redis: %w", err)
+		return nil, fmt.Errorf("failed to parse redisURL: %w", err)
+	}
+
+	redisClient := redis.NewClient(redisOpt)
+	if err := redisClient.Ping(context.Background()).Err(); err != nil {
+		return nil, fmt.Errorf("failed to connect to Redis: %w", err)
 	}
 
 	return &RedisIndex{
diff --git a/pkg/kvcache/kvblock/token_processor.go b/pkg/kvcache/kvblock/token_processor.go
@@ -33,13 +33,12 @@ const defaultBlockSize = 16
 
 // TokenProcessorConfig holds the configuration for the token processor.
 type TokenProcessorConfig struct {
-	BlockSize int
+	BlockSize int `json:"blockSize"`
 	// HashSeed is used to prefix initial hash chunks, similarly to vLLM's NONE_HASH.
 	// This should be aligned with vLLM's `PYTHONHASHSEED` environment variable.
 	// The system's deployer is responsible for aligning the vLLM deployments
 	// with the same seed value.
-	HashSeed string
-
+	HashSeed string  `json:"hashSeed"`
 	initHash *uint64 // cache once
 }
 
diff --git a/pkg/kvcache/kvevents/pool.go b/pkg/kvcache/kvevents/pool.go
@@ -17,11 +17,11 @@ import (
 // Config holds the configuration for the event processing pool.
 type Config struct {
 	// ZMQEndpoint is the ZMQ address to connect to (e.g., "tcp://indexer:5557").
-	ZMQEndpoint string
+	ZMQEndpoint string `json:"zmqEndpoint"`
 	// TopicFilter is the ZMQ subscription filter (e.g., "kv.").
-	TopicFilter string
+	TopicFilter string `json:"topicFilter"`
 	// Concurrency is the number of parallel workers to run.
-	Concurrency int
+	Concurrency int `json:"concurrency"`
 }
 
 // DefaultConfig returns a default configuration for the event processing pool.
diff --git a/pkg/tokenization/pool.go b/pkg/tokenization/pool.go
@@ -31,7 +31,7 @@ const defaultWorkers = 5
 
 // Config holds the configuration for the TokenizationPool.
 type Config struct {
-	WorkersCount int
+	WorkersCount int `json:"workersCount"`
 	*HFTokenizerConfig
 }
 
diff --git a/pkg/tokenization/prefixstore/lru_store.go b/pkg/tokenization/prefixstore/lru_store.go
@@ -35,8 +35,8 @@ const (
 
 // LRUStoreConfig contains initialization settings for LRUTokenStore (block size and cache size).
 type LRUStoreConfig struct {
-	CacheSize int
-	BlockSize int
+	CacheSize int `json:"cacheSize"`
+	BlockSize int `json:"blockSize"` // number of tokens per block
 }
 
 // defaultLRUStoreConfig returns an LRUStoreConfig instance with default configuration.
diff --git a/pkg/tokenization/tokenizer.go b/pkg/tokenization/tokenizer.go
@@ -37,8 +37,8 @@ type Tokenizer interface {
 
 // HFTokenizerConfig holds the configuration for the HuggingFace tokenizer.
 type HFTokenizerConfig struct {
-	HuggingFaceToken   string
-	TokenizersCacheDir string
+	HuggingFaceToken   string `json:"huggingFaceToken"`
+	TokenizersCacheDir string `json:"tokenizersCacheDir"` // Directory for caching tokenizers
 }
 
 // DefaultHFTokenizerConfig returns a default configuration for the HuggingFace

Original file line number	Diff line number	Diff line change
`@@ -55,7 +55,7 @@ func getKVCacheIndexerConfig() (*kvcache.Config, error) {`
`55`	`55`	`return nil, fmt.Errorf("failed to parse redis host: %w", err)`
`56`	`56`	`}`
`57`	`57`
`58`		`- config.KVBlockIndexConfig.RedisConfig.RedisOpt = redisOpt`
	`58`	`+ config.KVBlockIndexConfig.RedisConfig.Address = redisOpt.Addr`
`59`	`59`	`} // Otherwise defaults to in-memory indexer`
`60`	`60`
`61`	`61`	`return config, nil`
Original file line number	Diff line number	Diff line change
`@@ -31,7 +31,7 @@ const defaultWorkers = 5`
`31`	`31`
`32`	`32`	`// Config holds the configuration for the TokenizationPool.`
`33`	`33`	`type Config struct {`
`34`		`- WorkersCount int`
	`34`	+ WorkersCount int `json:"workersCount"`
`35`	`35`	`*HFTokenizerConfig`
`36`	`36`	`}`
`37`	`37`