feat: generic metrics scorer and prometheus extractor #2237

LukeAVanDrie · 2026-01-30T01:35:04Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduces a generic metrics scorer feature, enabling the EPP to schedule pods based on arbitrary Prometheus metrics (e.g., vllm:num_requests_running) without code modifications.

This PR implements a "Source -> Extractor -> Consumer" architecture:

Source: metrics-data-source (Existing plugin, fetches raw Prometheus text).
Extractor: prometheus-metric (New plugin).
- Parses raw text from the data source.
- Implements series selection (label matching) to extract specific vector values from a metric family.
Consumer: metric-scorer (New plugin).
- Maximization: Standard normalization ([(val - min) / (max - min)]
- Minimization: Implements Softmin logic for soft constraints on identifying the "least loaded" node.
- Normalization: Supports Linear and Softmax (distribution-aware) algorithms.
Consumer: metric-scorer (New plugin).
- Normalization: Decouples algorithm Linear vs Softmax from optimization mode (Minimize vs Maximize).
- Linear: Clamped range normalization ([(val - min) / (max - min)]).
- Softmax: Distribution-aware scoring. When paired with Minimize, effectively implements Softmin (exp(-x)).

Which issue(s) this PR fixes:

Fixes #2201

Does this PR introduce a user-facing change?:

Added generic metrics scorer to support scheduling based on arbitrary Prometheus metrics purely through config.

k8s-ci-robot · 2026-01-30T01:35:06Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

netlify · 2026-01-30T01:35:09Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`0992c12`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/697c0ffc90cb870008723947
😎 Deploy Preview	https://deploy-preview-2237--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-01-30T01:35:12Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LukeAVanDrie
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LukeAVanDrie · 2026-01-30T01:40:44Z

/test all

LukeAVanDrie · 2026-01-30T01:46:03Z

site-src/concepts/data-layer-architecture.md

This perhaps belongs in ./docs/... instead of ./site-src/.... I think this is more useful for project maintainers than users. Or, maybe this is not helpful enough to warrant the future maintenance burden. If so, I can remove this and keep this in my personal notes.

Introduces the Generic Metrics Scorer feature, enabling EPP to schedule pods based on arbitrary Prometheus metrics (e.g. `vllm:num_requests_running`) via configuration. Key components: - `prometheus-metric` Extractor: Implements Series Selection (label matching) to parse raw metrics. - `metric-scorer` Plugin: Supports Minimize (Softmin)/Maximize modes with Linear/Softmax normalization. - Documentation: Added comprehensive guide for Custom Metrics configuration.

LukeAVanDrie · 2026-01-30T02:04:37Z

pkg/epp/framework/plugins/scheduling/scorer/metric_scorer.go

+			logger.Info("Missing custom metric for endpoint", "endpoint", endpoint.GetMetadata().NamespacedName, "metric", s.config.MetricName)
+			// Apply worst-case value for missing metrics.
+			if s.config.OptimizationMode == OptimizationModeMinimize {
+				val = maxVal


If a user configures Softmax + Minimize (the default) but omits the max parameter (since Softmax doesn't mathematically require a range), s.config.Max defaults to 0.0.

A pod with missing metrics will be assigned a value of 0.0. If the metric is something like running requests, a real value might be 50. Since we are minimizing, the 0.0 (missing) is seen as better than 50 (healthy), causing the scheduler to funnel traffic to pods that are failing to report metrics.

I will change this fallback logic to use math.MaxFloat64 when Max is not provided to ensure missing metrics result in a worst-case score.

LukeAVanDrie · 2026-01-30T02:09:03Z

This PR is a DRAFT as I want to benchmark this on a real cluster still (only have hermetic test validation right now), and I am considering splitting up this change.

liu-cong · 2026-02-02T08:20:50Z

pkg/epp/framework/interface/datalayer/metrics.go


+	// Custom holds custom metrics scraped from the model server.
+	// The key is the metric name and the value is the metric value.
+	Custom map[string]float64


There are other data types than float. Let's define something like

type MetricValue struct { floatValue float64 intValue int stringValue string }

liu-cong · 2026-02-02T08:24:52Z

pkg/epp/datalayer/metrics/prometheus_extractor.go

+// PrometheusMetricPlugin extracts a specific metric from Prometheus format data.
+type PrometheusMetricPlugin struct {
+	typedName fwkplugin.TypedName
+	spec      *Spec


From a UX perspective, allowing a list of metrics provides better UX

liu-cong · 2026-02-02T08:33:27Z

pkg/epp/datalayer/metrics/prometheus_extractor.go

+// Produces returns the dynamic metric key this plugin produces.
+func (p *PrometheusMetricPlugin) Produces() map[string]any {
+	return map[string]any{
+		p.spec.Name: float64(0),


Debatable but I think the "internal" name of the data/metric should be decoupled from the external metric naming, allowing a unified internal view of the data, despite the external data source (e.g. , different model servers have different metric names)

liu-cong · 2026-02-02T08:47:13Z

pkg/epp/framework/plugins/scheduling/scorer/metric_scorer.go

+	MetricName string `json:"metricName"`
+	// Min is the minimum expected value for the metric (used for normalization).
+	Min float64 `json:"min"`
+	// Max is the maximum expected value for the metric (used for normalization).


Sometimes we don't have good values for the min/max, e.g,, what's the max of running requests?

Even if we know the theoretical range, the actual operating state could be within a much smaller range.

I think we need to score relative to the running max/min similar to what we do with the queue scorer. It's not ideal but it removes the UX barrier.

I need to do a better job documenting this on the config surface. This is sometimes required for normalization, but not always. E.g., SoftMax uses a dynamic range. For linear, this is needed though.

Running max and min also works though as a default strategy if no value is provided. I would probably send that as a followup PR though.

k8s-ci-robot · 2026-02-02T20:03:38Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 30, 2026

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 30, 2026

k8s-ci-robot requested review from danehans and liu-cong January 30, 2026 01:35

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 30, 2026

LukeAVanDrie mentioned this pull request Jan 30, 2026

Add a generic metrics scorer #2201

Open

LukeAVanDrie commented Jan 30, 2026

View reviewed changes

LukeAVanDrie force-pushed the feat/generic-metric-scorer branch from 151fd02 to 0992c12 Compare January 30, 2026 01:57

LukeAVanDrie commented Jan 30, 2026

View reviewed changes

liu-cong reviewed Feb 2, 2026

View reviewed changes

liu-cong mentioned this pull request Feb 2, 2026

Migrate kv cache utilization and queue scorers to the generic metrics scorer #2247

Open

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2026

feat: generic metrics scorer and prometheus extractor #2237

Are you sure you want to change the base?

feat: generic metrics scorer and prometheus extractor #2237

Uh oh!

Conversation

LukeAVanDrie commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jan 30, 2026

Uh oh!

netlify bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Jan 30, 2026

Uh oh!

LukeAVanDrie commented Jan 30, 2026

Uh oh!

LukeAVanDrie Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

LukeAVanDrie Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

LukeAVanDrie commented Jan 30, 2026

Uh oh!

liu-cong Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

liu-cong Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

liu-cong Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

liu-cong Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

LukeAVanDrie Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LukeAVanDrie commented Jan 30, 2026 •

edited

Loading

netlify bot commented Jan 30, 2026 •

edited

Loading