Skip to content

Ensure EPP flags are configurable via Helm chart #1302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions config/charts/inferencepool/templates/epp-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,20 @@ spec:
- "--model-server-metrics-path={{ .Values.inferenceExtension.modelServerMetricsPath }}"
- "--model-server-metrics-scheme={{ .Values.inferenceExtension.modelServerMetricsScheme }}"
- "--model-server-metrics-https-insecure-skip-verify={{ .Values.inferenceExtension.modelServerMetricsHttpsInsecureSkipVerify }}"
- "--model-server-metrics-port={{ .Values.inferenceExtension.modelServerMetricsPort }}"
- "--secure-serving={{ .Values.inferenceExtension.secureServing }}"
- "--health-checking={{ .Values.inferenceExtension.healthChecking }}"
- "--cert-path={{ .Values.inferenceExtension.certPath }}"
- "--destination-endpoint-hint-key={{ .Values.inferenceExtension.destinationEndpointHintKey }}"
- "--destination-endpoint-hint-metadata-namespace={{ .Values.inferenceExtension.destinationEndpointHintMetadataNamespace }}"
- "--fairness-id-header-key={{ .Values.inferenceExtension.fairnessIDHeaderKey }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these three flags should be removed (see PR #1296).

- "--total-queued-requests-metric={{ .Values.inferenceExtension.totalQueuedRequestsMetric }}"
- "--kv-cache-usage-percentage-metric={{ .Values.inferenceExtension.kvCacheUsagePercentageMetric }}"
- "--lora-info-metric={{ .Values.inferenceExtension.loraInfoMetric }}"
- "--refresh-metrics-interval={{ .Values.inferenceExtension.refreshMetricsInterval }}"
- "--refresh-prometheus-metrics-interval={{ .Values.inferenceExtension.refreshPrometheusMetricsInterval }}"
- "--metrics-staleness-threshold={{ .Values.inferenceExtension.metricsStalenessThreshold }}"
- "--config-text={{ .Values.inferenceExtension.configText }}"
{{- if eq (.Values.inferencePool.modelServerType | default "vllm") "triton-tensorrt-llm" }}
- --total-queued-requests-metric
- "nv_trt_llm_request_metrics{request_type=waiting}"
Expand Down
34 changes: 28 additions & 6 deletions config/charts/inferencepool/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,34 @@ inferenceExtension:
pullPolicy: Always
extProcPort: 9002
env: []
enablePprof: true # Enable pprof handlers for profiling and debugging
enablePprof: true # Enable pprof handlers for profiling and debugging
modelServerMetricsPath: "/metrics"
modelServerMetricsScheme: "http"
modelServerMetricsHttpsInsecureSkipVerify: true
# This is the plugins configuration file.
grpcPort: 9002
grpcHealthPort: 9003
metricsPort: 9090
destinationEndpointHintMetadataNamespace: "envoy.lb"
destinationEndpointHintKey: "x-gateway-destination-endpoint"
fairnessIDHeaderKey: "x-gateway-inference-fairness-id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

poolName: ""
poolNamespace: "default"
refreshMetricsInterval: "50ms"
refreshPrometheusMetricsInterval: "5s"
secureServing: true
healthChecking: false
totalQueuedRequestsMetric: "vllm:num_requests_waiting"
kvCacheUsagePercentageMetric: "vllm:gpu_cache_usage_perc"
loraInfoMetric: "vllm:lora_requests_info"
certPath: ""
configFile: ""
configText: ""
metricsStalenessThreshold: "2s"

pluginsConfigFile: "default-plugins.yaml"
logVerbosity: 1

# This is the plugins configuration file.
# pluginsCustomConfig:
# custom-plugins.yaml: |
# apiVersion: inference.networking.x-k8s.io/v1alpha1
Expand All @@ -34,18 +56,18 @@ inferenceExtension:
# Example environment variables:
# env:
# KV_CACHE_SCORE_WEIGHT: "1"

# Define additional container ports
modelServerMetricsPort: 0
extraContainerPorts: []
# Define additional service ports
extraServicePorts: []

inferencePool:
targetPortNumber: 8000
modelServerType: vllm # vllm, triton-tensorrt-llm
# modelServers: # REQUIRED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this change please, we should not default this, it should be explicitly set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this comment addressed?

# matchLabels:
# app: vllm-llama3-8b-instruct
modelServers:
matchLabels:
app: vllm-llama3-8b-instruct

provider:
name: none
Expand Down