Skip to content

Commit 81e1af7

Browse files
committed
update README.md
1 parent a8b406a commit 81e1af7

File tree

4 files changed

+48
-35
lines changed

4 files changed

+48
-35
lines changed

config/charts/inferencepool/README.md

Lines changed: 41 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -166,31 +166,34 @@ $ helm uninstall pool-1
166166

167167
The following table list the configurable parameters of the chart.
168168

169-
| **Parameter Name** | **Description** |
170-
|---------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
171-
| `inferencePool.apiVersion` | The API version of the InferencePool resource. Defaults to `inference.networking.k8s.io/v1`. This can be changed to `inference.networking.x-k8s.io/v1alpha2` to support older API versions. |
172-
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
173-
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
174-
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
175-
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
176-
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
177-
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
178-
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
179-
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
180-
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
181-
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
182-
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
183-
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
184-
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
185-
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. | |
186-
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
187-
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
188-
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
189-
| `inferenceExtension.monitoring.gke.enabled` | Enable GKE monitoring resources (`PodMonitoring` and RBAC). Defaults to `false`. |
190-
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
191-
| `inferenceExtension.trace.enabled` | Enables or disables OpenTelemetry tracing globally for the EndpointPicker. |
192-
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |
193-
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |
169+
| **Parameter Name** | **Description** |
170+
|----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
171+
| `inferencePool.apiVersion` | The API version of the InferencePool resource. Defaults to `inference.networking.k8s.io/v1`. This can be changed to `inference.networking.x-k8s.io/v1alpha2` to support older API versions. |
172+
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
173+
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
174+
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
175+
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
176+
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
177+
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
178+
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
179+
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
180+
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
181+
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
182+
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
183+
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
184+
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
185+
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. |
186+
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
187+
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
188+
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
189+
| `inferenceExtension.monitoring.gke.enabled` | Enable GKE monitoring resources (`PodMonitoring` and RBAC). Defaults to `false`. |
190+
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
191+
| `inferenceExtension.tracing.enabled` | Enables or disables OpenTelemetry tracing globally for the EndpointPicker. |
192+
| `inferenceExtension.tracing.otelExporterEndpoint` | OpenTelemetry collector endpoint. |
193+
| `inferenceExtension.tracing.sampling.sampler` | The trace sampler to use. Currently, only `parentbased_traceidratio` is supported. This sampler respects the parent span’s sampling decision when present, and applies the configured ratio for root spans. |
194+
| `inferenceExtension.tracing.sampling.samplerArg` | Sampler-specific argument. For `parentbased_traceidratio`, this defines the base sampling rate for new traces (root spans), as a float string in the range [0.0, 1.0]. For example, "0.1" enables 10% sampling. |
195+
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |
196+
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |
194197

195198
### Provider Specific Configuration
196199

@@ -215,10 +218,20 @@ These are the options available to you with `provider.name` set to `istio`:
215218
| `istio.destinationRule.host` | Custom host value for the destination rule. If not set this will use the default value which is derrived from the epp service name and release namespace to gerenate a valid service address. |
216219
| `istio.destinationRule.trafficPolicy.connectionPool` | Configure the connectionPool level settings of the traffic policy |
217220

218-
## OpenTelemetry
221+
#### OpenTelemetry
219222

220-
The EndpointPicker supports OpenTelemetry-based tracing. To enable it, use `--set inferenceExtension.trace.enabled=true`
221-
and configure the correct OpenTelemetry collector endpoint via the environment variable `OTEL_EXPORTER_OTLP_ENDPOINT` in `inferenceExtension.env`.
223+
The EndpointPicker supports OpenTelemetry-based tracing. To enable trace collection, use the following configuration:
224+
```yaml
225+
inferenceExtension:
226+
tracing:
227+
enabled: true
228+
otelExporterEndpoint: "http://localhost:4317"
229+
sampling:
230+
sampler: "parentbased_traceidratio"
231+
samplerArg: "0.1"
232+
```
233+
Make sure that the `otelExporterEndpoint` points to your OpenTelemetry collector endpoint.
234+
Current only the `parentbased_traceidratio` sampler is supported. You can adjust the base sampling ratio using the `samplerArg` (e.g., 0.1 means 10% of traces will be sampled).
222235

223236
## Notes
224237

config/charts/inferencepool/templates/epp-deployment.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ spec:
6363
- "{{ .value }}"
6464
{{- end }}
6565
- "--tracing"
66-
{{- if .Values.inferenceExtension.trace.enabled }}
66+
{{- if .Values.inferenceExtension.tracing.enabled }}
6767
- "true"
6868
{{- else }}
6969
- "false"
@@ -107,11 +107,11 @@ spec:
107107
valueFrom:
108108
fieldRef:
109109
fieldPath: metadata.namespace
110-
{{- if .Values.inferenceExtension.trace.enabled }}
110+
{{- if .Values.inferenceExtension.tracing.enabled }}
111111
- name: OTEL_SERVICE_NAME
112112
value: "gateway-api-inference-extension"
113113
- name: OTEL_EXPORTER_OTLP_ENDPOINT
114-
value: {{ .Values.inferenceExtension.trace.otelExporterEndpoint | default "http://localhost:4317" | quote }}
114+
value: {{ .Values.inferenceExtension.tracing.otelExporterEndpoint | default "http://localhost:4317" | quote }}
115115
- name: OTEL_TRACES_EXPORTER
116116
value: "otlp"
117117
- name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
@@ -127,9 +127,9 @@ spec:
127127
- name: OTEL_RESOURCE_ATTRIBUTES
128128
value: 'k8s.namespace.name=$(NAMESPACE),k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME)'
129129
- name: OTEL_TRACES_SAMPLER
130-
value: {{ .Values.inferenceExtension.trace.sampling.sampler | default "parentbased_traceidratio" | quote }}
130+
value: {{ .Values.inferenceExtension.tracing.sampling.sampler | default "parentbased_traceidratio" | quote }}
131131
- name: OTEL_TRACES_SAMPLER_ARG
132-
value: {{ .Values.inferenceExtension.trace.sampling.samplerArg | default "0.1" | quote }}
132+
value: {{ .Values.inferenceExtension.tracing.sampling.samplerArg | default "0.1" | quote }}
133133
{{- end }}
134134
{{- if .Values.inferenceExtension.env }}
135135
{{- toYaml .Values.inferenceExtension.env | nindent 8 }}

config/charts/inferencepool/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ inferenceExtension:
5353

5454
gke:
5555
enabled: false
56-
trace:
56+
tracing:
5757
enabled: false
5858
otelExporterEndpoint: "http://localhost:4317"
5959
sampling:

pkg/common/telemetry.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ func InitTracing(ctx context.Context, logger logr.Logger) error {
6060
traceExporter, err := initTraceExporter(ctx, logger)
6161
if err != nil {
6262
loggerWrap.Handle(fmt.Errorf("%s: %v", "init trace exporter fail", err))
63-
return nil
63+
return err
6464
}
6565

6666
// Go SDK doesn't have an automatic sampler, handle manually

0 commit comments

Comments
 (0)