Skip to content

Commit 086c2e0

Browse files
authored
refactor: Integrate prefix-cache configuration into a single knob (#237)
Simplifies the configuration structure of the prefix-cache-scorer plugin by unifying all mode-specific parameters into a single configuration type. The prefix-cache-scorer now supports a mode option: - When set to estimate (the default), it uses the GIE prefix cache scorer based on estimation from previous requests. - When set to cache_tracking, it creates a prefix cache scorer based on KV-events from vLLM. Signed-off-by: Kfir Toledo <[email protected]>
1 parent f8d4bc9 commit 086c2e0

File tree

11 files changed

+285
-31
lines changed

11 files changed

+285
-31
lines changed

deploy/components/inference-gateway/deployments.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ spec:
1919
containers:
2020
- name: epp
2121
image: ghcr.io/llm-d/llm-d-inference-scheduler:latest
22-
imagePullPolicy: IfNotPresent
22+
imagePullPolicy: Always
2323
args:
2424
- -poolName
2525
- "${POOL_NAME}"

deploy/config/epp-kvcache-load-config.yaml renamed to deploy/config/epp-prefix-cache-tracking-config.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,17 @@ kind: EndpointPickerConfig
55
plugins:
66
- type: single-profile-handler
77
- type: decode-filter
8-
- type: kvcache-aware-scorer
8+
- type: prefix-cache-scorer
9+
parameters:
10+
mode: cache_tracking
11+
kvCacheRedisAddr: ${REDIS_HOST}:${REDIS_PORT}
912
- type: load-aware-scorer
1013
- type: max-score-picker
1114
schedulingProfiles:
1215
- name: default
1316
plugins:
1417
- pluginRef: decode-filter
15-
- pluginRef: kvcache-aware-scorer
18+
- pluginRef: prefix-cache-scorer
1619
weight: 2.0
1720
- pluginRef: load-aware-scorer
1821
weight: 1.0
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Sample EPP configuration for running with prefix cache in estimate mode and load-aware scorers
2+
#
3+
apiVersion: inference.networking.x-k8s.io/v1alpha1
4+
kind: EndpointPickerConfig
5+
plugins:
6+
- type: single-profile-handler
7+
- type: decode-filter
8+
- type: prefix-cache-scorer
9+
- type: load-aware-scorer
10+
- type: max-score-picker
11+
schedulingProfiles:
12+
- name: default
13+
plugins:
14+
- pluginRef: decode-filter
15+
- pluginRef: prefix-cache-scorer
16+
weight: 2.0
17+
- pluginRef: load-aware-scorer
18+
weight: 1.0
19+
- pluginRef: max-score-picker

deploy/environments/dev/kubernetes-kgateway/patch-deployments.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@ spec:
2424
- --configFile
2525
- "/etc/epp/epp-config.yaml"
2626
env:
27-
- name: KVCACHE_INDEXER_REDIS_ADDR
28-
value: ${REDIS_HOST}:${REDIS_PORT}
2927
- name: HF_TOKEN
3028
valueFrom:
3129
secretKeyRef:

docs/architecture.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -211,14 +211,30 @@ with a value of `prefill`.<br>
211211
*Type:* prefill-filter<br>
212212
*Parameters:* None<br>
213213

214-
**KvCacheAwareScorer**<br>
215-
Scores based on real KV-cache state on vLLM. It is more accurate than either the SessionAffinity
216-
or PrefixCachePlugin, but requires extra computation and cycles to track the current cache state<br>
217-
*Type:* kvcache-aware-scorer<br>
218-
*Parameters:* Due to the sensitivity of the parameters of this plugin, the following
219-
environment variables are used to configure the scorer:<br>
220-
`KVCACHE_INDEXER_REDIS_ADDR` - the address of the Redis server used<br>
221-
`HF_TOKEN` - the Hugginface token to be used.<br>
214+
**PrefixCacheScorer**<br>
215+
The `prefix-cache-scorer` scores a request based on the KV cache localities.
216+
It supports two modes: `estimate` and `cache_tracking`.<br>
217+
218+
**`estimate` mode** (default):<br>
219+
This mode uses the default GIE prefix scorer and scores pods based on how much of the prompt is estimated to be present in the pod’s KV cache.<br>
220+
*Type*: `prefix-cache-scorer`<br>
221+
*Parameters:*<br>
222+
223+
\- `hashBlockSize`: Specifies the size of the blocks used to split the input **prompt** when calculating block hashes. Defaults to `64` if not specified.<br>
224+
\- `maxPrefixBlocksToMatch`: Specifies the maximum number of prefix blocks to match. Defaults to `256` if not specified.<br>
225+
\- `lruCapacityPerServer`: Specifies the capacity of the LRU indexer, in number of entries per server (pod). Defaults to `31,250` if not specified.<br>
226+
227+
**Note:** \- `mode: estimate` is not required, as it is the default.
228+
229+
**`cache_tracking` mode**:<br>
230+
This mode scores requests based on the actual KV cache state in vLLM. It is more accurate than both `SessionAffinity` and `PrefixCachePlugin` in `estimate` mode,
231+
but incurs additional computation overhead to track the current cache state.<br>
232+
*Type*: `prefix-cache-scorer`<br>
233+
*Parameters:*<br>
234+
\- `mode: cache_tracking`<br>
235+
\- `kvCacheRedisAddr`: The address of the Redis instance used for cache tracking.
236+
Due to the sensitivity of this plugin’s parameters, the following environment variable is required when using `cache_tracking` mode:
237+
`HF_TOKEN`: The Hugging Face token to be used.
222238

223239
**LoadAwareScorer**<br>
224240
Scores pods based on their load, based on the number of requests concurrently being processed.

go.mod

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ go 1.24.1
55
toolchain go1.24.2
66

77
require (
8+
github.com/alicebob/miniredis/v2 v2.34.0
89
github.com/go-logr/logr v1.4.3
910
github.com/google/go-cmp v0.7.0
1011
github.com/llm-d/llm-d-kv-cache-manager v0.1.1
@@ -19,6 +20,7 @@ require (
1920

2021
require (
2122
cel.dev/expr v0.23.0 // indirect
23+
github.com/alicebob/gopher-json v0.0.0-20230218143504-906a9b012302 // indirect
2224
github.com/antlr4-go/antlr/v4 v4.13.0 // indirect
2325
github.com/beorn7/perks v1.0.1 // indirect
2426
github.com/blang/semver/v4 v4.0.0 // indirect
@@ -40,20 +42,27 @@ require (
4042
github.com/go-openapi/jsonpointer v0.21.0 // indirect
4143
github.com/go-openapi/jsonreference v0.21.0 // indirect
4244
github.com/go-openapi/swag v0.23.0 // indirect
45+
github.com/go-task/slim-sprig/v3 v3.0.0 // indirect
4346
github.com/gogo/protobuf v1.3.2 // indirect
4447
github.com/google/btree v1.1.3 // indirect
4548
github.com/google/cel-go v0.23.2 // indirect
4649
github.com/google/gnostic-models v0.6.9 // indirect
50+
github.com/google/pprof v0.0.0-20250403155104-27863c87afa6 // indirect
4751
github.com/google/uuid v1.6.0 // indirect
52+
github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674 // indirect
4853
github.com/grpc-ecosystem/grpc-gateway/v2 v2.24.0 // indirect
4954
github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect
5055
github.com/inconshreveable/mousetrap v1.1.0 // indirect
5156
github.com/josharian/intern v1.0.0 // indirect
5257
github.com/json-iterator/go v1.1.12 // indirect
5358
github.com/mailru/easyjson v0.7.7 // indirect
59+
github.com/moby/spdystream v0.5.0 // indirect
5460
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
5561
github.com/modern-go/reflect2 v1.0.2 // indirect
5662
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
63+
github.com/mxk/go-flowrate v0.0.0-20140419014527-cca7078d478f // indirect
64+
github.com/onsi/ginkgo/v2 v2.23.4 // indirect
65+
github.com/onsi/gomega v1.37.0 // indirect
5766
github.com/pkg/errors v0.9.1 // indirect
5867
github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10 // indirect
5968
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
@@ -65,6 +74,7 @@ require (
6574
github.com/spf13/pflag v1.0.6 // indirect
6675
github.com/stoewer/go-strcase v1.3.0 // indirect
6776
github.com/x448/float16 v0.8.4 // indirect
77+
github.com/yuin/gopher-lua v1.1.1 // indirect
6878
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
6979
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.58.0 // indirect
7080
go.opentelemetry.io/otel v1.35.0 // indirect
@@ -74,6 +84,7 @@ require (
7484
go.opentelemetry.io/otel/sdk v1.35.0 // indirect
7585
go.opentelemetry.io/otel/trace v1.35.0 // indirect
7686
go.opentelemetry.io/proto/otlp v1.4.0 // indirect
87+
go.uber.org/automaxprocs v1.6.0 // indirect
7788
go.uber.org/multierr v1.11.0 // indirect
7889
go.uber.org/zap v1.27.0 // indirect
7990
go.yaml.in/yaml/v2 v2.4.2 // indirect
@@ -85,6 +96,7 @@ require (
8596
golang.org/x/term v0.32.0 // indirect
8697
golang.org/x/text v0.25.0 // indirect
8798
golang.org/x/time v0.9.0 // indirect
99+
golang.org/x/tools v0.31.0 // indirect
88100
gomodules.xyz/jsonpatch/v2 v2.4.0 // indirect
89101
google.golang.org/genproto/googleapis/api v0.0.0-20250324211829-b45e905df463 // indirect
90102
google.golang.org/genproto/googleapis/rpc v0.0.0-20250428153025-10db94c68c34 // indirect

go.sum

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
cel.dev/expr v0.23.0 h1:wUb94w6OYQS4uXraxo9U+wUAs9jT47Xvl4iPgAwM2ss=
22
cel.dev/expr v0.23.0/go.mod h1:hLPLo1W4QUmuYdA72RBX06QTs6MXw941piREPl3Yfiw=
3+
github.com/alicebob/gopher-json v0.0.0-20230218143504-906a9b012302 h1:uvdUDbHQHO85qeSydJtItA4T55Pw6BtAejd0APRJOCE=
4+
github.com/alicebob/gopher-json v0.0.0-20230218143504-906a9b012302/go.mod h1:SGnFV6hVsYE877CKEZ6tDNTjaSXYUk6QqoIK6PrAtcc=
5+
github.com/alicebob/miniredis/v2 v2.34.0 h1:mBFWMaJSNL9RwdGRyEDoAAv8OQc5UlEhLDQggTglU/0=
6+
github.com/alicebob/miniredis/v2 v2.34.0/go.mod h1:kWShP4b58T1CW0Y5dViCd5ztzrDqRWqM3nksiyXk5s8=
37
github.com/antlr4-go/antlr/v4 v4.13.0 h1:lxCg3LAv+EUK6t1i0y1V6/SLeUi0eKEKdhQAlS8TVTI=
48
github.com/antlr4-go/antlr/v4 v4.13.0/go.mod h1:pfChB/xh/Unjila75QW7+VU4TSnWnnk9UTnmpPaOR2g=
9+
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio=
10+
github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs=
511
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
612
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
713
github.com/blang/semver/v4 v4.0.0 h1:1PFHFE6yCCTv8C1TeyNNarDzntLi7wMI5i/pzqYIsAM=
@@ -124,6 +130,8 @@ github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10/go.mod h1
124130
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
125131
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U=
126132
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
133+
github.com/prashantv/gostub v1.1.0 h1:BTyx3RfQjRHnUWaGF9oQos79AlQ5k8WNktv7VGvVH4g=
134+
github.com/prashantv/gostub v1.1.0/go.mod h1:A5zLQHz7ieHGG7is6LLXLz7I8+3LZzsrV0P1IAHhP5U=
127135
github.com/prometheus/client_golang v1.22.0 h1:rb93p9lokFEsctTys46VnV1kLCDpVZ0a/Y92Vm0Zc6Q=
128136
github.com/prometheus/client_golang v1.22.0/go.mod h1:R7ljNsLXhuQXYZYtw6GAE9AZg8Y7vEW5scdCXrWRXC0=
129137
github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
@@ -158,6 +166,8 @@ github.com/x448/float16 v0.8.4 h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM=
158166
github.com/x448/float16 v0.8.4/go.mod h1:14CWIYCyZA/cWjXOioeEpHeN/83MdbZDRQHoFcYsOfg=
159167
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
160168
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
169+
github.com/yuin/gopher-lua v1.1.1 h1:kYKnWBjvbNP4XLT3+bPEwAXJx262OhaHDWDVOPjL46M=
170+
github.com/yuin/gopher-lua v1.1.1/go.mod h1:GBR0iDaNXjAgGg9zfCvksxSRnQx76gclCIb7kdAd1Pw=
161171
go.opentelemetry.io/auto/sdk v1.1.0 h1:cH53jehLUN6UFLY71z+NDOiNJqDdPRaXzTel0sJySYA=
162172
go.opentelemetry.io/auto/sdk v1.1.0/go.mod h1:3wSPjt5PWp2RhlCcmmOial7AvC4DQqZb7a7wCow3W8A=
163173
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.58.0 h1:yd02MEjBdJkG3uabWP9apV+OuWRIXGDuJEUJbOHmCFU=

pkg/plugins/register.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package plugins
22

33
import (
44
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/plugins"
5+
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/framework/plugins/multi/prefix"
56

67
"github.com/llm-d/llm-d-inference-scheduler/pkg/plugins/filter"
78
prerequest "github.com/llm-d/llm-d-inference-scheduler/pkg/plugins/pre-request"
@@ -17,7 +18,7 @@ func RegisterAllPlugins() {
1718
plugins.Register(filter.PrefillFilterType, filter.PrefillFilterFactory)
1819
plugins.Register(prerequest.PrefillHeaderHandlerType, prerequest.PrefillHeaderHandlerFactory)
1920
plugins.Register(profile.PdProfileHandlerType, profile.PdProfileHandlerFactory)
20-
plugins.Register(scorer.KvCacheAwareScorerType, scorer.KvCacheAwareScorerFactory)
21+
plugins.Register(prefix.PrefixCachePluginType, scorer.PrefixCachePluginFactory)
2122
plugins.Register(scorer.LoadAwareScorerType, scorer.LoadAwareScorerFactory)
2223
plugins.Register(scorer.SessionAffinityScorerType, scorer.SessionAffinityScorerFactory)
2324
}

pkg/plugins/scorer/kvcache_aware.go

Lines changed: 59 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package scorer
33
import (
44
"context"
55
"encoding/json"
6+
"errors"
67
"fmt"
78
"os"
89
"strings"
@@ -13,41 +14,85 @@ import (
1314
"sigs.k8s.io/controller-runtime/pkg/log"
1415
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/plugins"
1516
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/framework"
17+
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/framework/plugins/multi/prefix"
1618
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/types"
1719
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging"
1820
)
1921

20-
const (
21-
// KvCacheAwareScorerType is the type of the KvCacheAwareScorer
22-
KvCacheAwareScorerType = "kvcache-aware-scorer"
22+
// PrefixCachePluginMode defines the mode of the prefix cache plugin. It can be either `estimate` or `cache_tracking`.
23+
type PrefixCachePluginMode string
2324

24-
kvCacheRedisEnvVar = "KVCACHE_INDEXER_REDIS_ADDR"
25+
const (
26+
// PrefixCachePluginModeEstimate is the mode where the plugin use estimated prefix.
27+
PrefixCachePluginModeEstimate PrefixCachePluginMode = "estimate"
28+
// PrefixCachePluginModeCacheTracking is the mode where the plugin uses cache tracking using KVevents.
29+
PrefixCachePluginModeCacheTracking PrefixCachePluginMode = "cache_tracking"
30+
// huggingFaceTokenEnvVar is the environment variable that holds the Hugging Face token.
2531
huggingFaceTokenEnvVar = "HF_TOKEN"
2632
)
2733

34+
// PrefixCachePluginConfig holds the configuration for the PrefixCachePlugin.
35+
type PrefixCachePluginConfig struct {
36+
// Mode defines the mode of the prefix cache plugin.
37+
Mode PrefixCachePluginMode `json:"mode"` // "prefix" or "cache_tracking"
38+
// Config holds the configuration for the prefix cache plugin.
39+
prefix.Config
40+
// kvCacheRedisAddr is the address of the Redis instance used for cache tracking.
41+
KVCacheRedisAddr string `json:"kvCacheRedisAddr"`
42+
}
43+
2844
// compile-time type assertion
2945
var _ framework.Scorer = &KVCacheAwareScorer{}
3046

31-
// KvCacheAwareScorerFactory defines the factory function for the KVCacheAwareScorer
32-
func KvCacheAwareScorerFactory(name string, _ json.RawMessage, handle plugins.Handle) (plugins.Plugin, error) {
33-
plugin, err := NewKVCacheAwareScorer(handle.Context())
34-
if err != nil {
35-
return nil, err
47+
// PrefixCachePluginFactory creates a new instance of the PrefixCachePlugin based on the provided configuration.
48+
func PrefixCachePluginFactory(name string, rawParameters json.RawMessage, handle plugins.Handle) (plugins.Plugin, error) {
49+
var cfg PrefixCachePluginConfig
50+
51+
logger := log.FromContext(handle.Context()).WithName("PrefixCachePluginFactory").V(logutil.DEFAULT)
52+
// Fallback to empty JSON if parameters are missing
53+
if rawParameters == nil {
54+
rawParameters = []byte(`{}`)
55+
}
56+
// Unmarshal directly into the flat config struct
57+
if err := json.Unmarshal(rawParameters, &cfg); err != nil {
58+
return nil, fmt.Errorf("failed to parse %s plugin config: %w", prefix.PrefixCachePluginType, err)
59+
}
60+
61+
mode := cfg.Mode
62+
if mode == "" {
63+
mode = PrefixCachePluginModeEstimate
64+
}
65+
66+
switch mode {
67+
case PrefixCachePluginModeEstimate:
68+
logger.Info("Creating PrefixCachePlugin in estimate mode", "parameters", rawParameters)
69+
return prefix.PrefixCachePluginFactory(name, rawParameters, handle)
70+
71+
case PrefixCachePluginModeCacheTracking:
72+
logger.Info("Creating PrefixCachePluginConfig in cache tracking mode", "parameters", rawParameters)
73+
74+
plugin, err := NewKVCacheAwareScorer(handle.Context(), &cfg)
75+
if err != nil {
76+
return nil, fmt.Errorf("failed to create %s plugin: %w", prefix.PrefixCachePluginType, err)
77+
}
78+
return plugin.WithName(name), nil
79+
80+
default:
81+
return nil, fmt.Errorf("unknown mode for %s plugin: %s", prefix.PrefixCachePluginType, mode)
3682
}
37-
return plugin.WithName(name), nil
3883
}
3984

4085
// NewKVCacheAwareScorer creates a new KVCacheAwareScorer instance.
4186
// It initializes the KVCacheIndexer from environment variables.
4287
//
4388
// If the environment variables are not set, or if the indexer
4489
// fails to initialize, an error is returned.
45-
func NewKVCacheAwareScorer(ctx context.Context) (*KVCacheAwareScorer, error) {
90+
func NewKVCacheAwareScorer(ctx context.Context, cfg *PrefixCachePluginConfig) (*KVCacheAwareScorer, error) {
4691
config := kvcache.NewDefaultConfig()
4792

48-
redisAddr := os.Getenv(kvCacheRedisEnvVar)
93+
redisAddr := cfg.KVCacheRedisAddr
4994
if redisAddr == "" {
50-
return nil, fmt.Errorf("environment variable '%s' is not set", kvCacheRedisEnvVar)
95+
return nil, errors.New("environment variable kvCacheRedisAddr is not set")
5196
}
5297

5398
// to keep compatibility with deployments only specifying hostname:port: need to add protocol to front to enable parsing
@@ -76,7 +121,7 @@ func NewKVCacheAwareScorer(ctx context.Context) (*KVCacheAwareScorer, error) {
76121
go kvCacheIndexer.Run(ctx)
77122

78123
return &KVCacheAwareScorer{
79-
typedName: plugins.TypedName{Type: KvCacheAwareScorerType},
124+
typedName: plugins.TypedName{Type: prefix.PrefixCachePluginType},
80125
kvCacheIndexer: kvCacheIndexer,
81126
}, nil
82127
}

scripts/kubernetes-dev-env.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ export PD_ENABLED="\"${PD_ENABLED:-false}\""
8383
# Token length threshold to trigger P/D logic
8484
export PD_PROMPT_LEN_THRESHOLD="\"${PD_PROMPT_LEN_THRESHOLD:-10}\""
8585

86-
export EPP_CONFIG="${EPP_CONFIG:-deploy/config/epp-kvcache-load-config.yaml}"
86+
export EPP_CONFIG="${EPP_CONFIG:-deploy/config/epp-prefix-cache-tracking-config.yaml}"
8787

8888
# Redis deployment name
8989
export REDIS_DEPLOYMENT_NAME="${REDIS_DEPLOYMENT_NAME:-lookup-server}"
@@ -92,7 +92,7 @@ export REDIS_DEPLOYMENT_NAME="${REDIS_DEPLOYMENT_NAME:-lookup-server}"
9292
export REDIS_SVC_NAME="${REDIS_SVC_NAME:-${REDIS_DEPLOYMENT_NAME}-service}"
9393

9494
# Redis FQDN for internal Kubernetes communication
95-
export REDIS_HOST="${REDIS_HOST:-${REDIS_SVC_NAME}.${NAMESPACE}.svc.cluster.local}"
95+
export REDIS_HOST="${REDIS_HOST:-vllm-${REDIS_SVC_NAME}.${NAMESPACE}.svc.cluster.local}"
9696

9797
# Redis port
9898
export REDIS_PORT="${REDIS_PORT:-8100}"
@@ -191,7 +191,7 @@ helm upgrade --install "$VLLM_HELM_RELEASE_NAME" "$VLLM_CHART_DIR" \
191191
--set redis.service.port="$REDIS_PORT"
192192

193193
echo "INFO: Deploying Gateway Environment in namespace ${NAMESPACE}, ${POOL_NAME}"
194-
kubectl -n "${NAMESPACE}" create configmap epp-config --from-file=epp-config.yaml=${EPP_CONFIG}
194+
kubectl -n "${NAMESPACE}" create configmap epp-config --from-file=epp-config.yaml=<(envsubst < "${EPP_CONFIG}") --dry-run=client -o yaml | kubectl apply -f -
195195
kustomize build deploy/environments/dev/kubernetes-kgateway | envsubst | kubectl -n "${NAMESPACE}" apply -f -
196196
echo "INFO: Waiting for resources in namespace ${NAMESPACE} to become ready"
197197
# Wait for gateway resources

0 commit comments

Comments
 (0)