Skip to content

Conversation

@LukeAVanDrie
Copy link
Contributor

@LukeAVanDrie LukeAVanDrie commented Jan 30, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduces a generic metrics scorer feature, enabling the EPP to schedule pods based on arbitrary Prometheus metrics (e.g., vllm:num_requests_running) without code modifications.

This PR implements a "Source -> Extractor -> Consumer" architecture:

  1. Source: metrics-data-source (Existing plugin, fetches raw Prometheus text).
  2. Extractor: prometheus-metric (New plugin).
    • Parses raw text from the data source.
    • Implements series selection (label matching) to extract specific vector values from a metric family.
  3. Consumer: metric-scorer (New plugin).
    • Maximization: Standard normalization ([(val - min) / (max - min)]
    • Minimization: Implements Softmin logic for soft constraints on identifying the "least loaded" node.
    • Normalization: Supports Linear and Softmax (distribution-aware) algorithms.
  4. Consumer: metric-scorer (New plugin).
    • Normalization: Decouples algorithm Linear vs Softmax from optimization mode (Minimize vs Maximize).
    • Linear: Clamped range normalization ([(val - min) / (max - min)]).
    • Softmax: Distribution-aware scoring. When paired with Minimize, effectively implements Softmin (exp(-x)).

Which issue(s) this PR fixes:

Fixes #2201

Does this PR introduce a user-facing change?:

Added generic metrics scorer to support scheduling based on arbitrary Prometheus metrics purely through config.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Jan 30, 2026
@netlify
Copy link

netlify bot commented Jan 30, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 0992c12
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/697c0ffc90cb870008723947
😎 Deploy Preview https://deploy-preview-2237--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 30, 2026
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LukeAVanDrie
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 30, 2026
@LukeAVanDrie
Copy link
Contributor Author

/test all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This perhaps belongs in ./docs/... instead of ./site-src/.... I think this is more useful for project maintainers than users. Or, maybe this is not helpful enough to warrant the future maintenance burden. If so, I can remove this and keep this in my personal notes.

Introduces the Generic Metrics Scorer feature, enabling EPP to schedule
pods based on arbitrary Prometheus metrics (e.g.
`vllm:num_requests_running`) via configuration.

Key components:
- `prometheus-metric` Extractor: Implements Series Selection (label
  matching) to parse raw metrics.
- `metric-scorer` Plugin: Supports Minimize (Softmin)/Maximize modes
  with Linear/Softmax normalization.
- Documentation: Added comprehensive guide for Custom Metrics
  configuration.
@LukeAVanDrie LukeAVanDrie force-pushed the feat/generic-metric-scorer branch from 151fd02 to 0992c12 Compare January 30, 2026 01:57
logger.Info("Missing custom metric for endpoint", "endpoint", endpoint.GetMetadata().NamespacedName, "metric", s.config.MetricName)
// Apply worst-case value for missing metrics.
if s.config.OptimizationMode == OptimizationModeMinimize {
val = maxVal
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user configures Softmax + Minimize (the default) but omits the max parameter (since Softmax doesn't mathematically require a range), s.config.Max defaults to 0.0.

A pod with missing metrics will be assigned a value of 0.0. If the metric is something like running requests, a real value might be 50. Since we are minimizing, the 0.0 (missing) is seen as better than 50 (healthy), causing the scheduler to funnel traffic to pods that are failing to report metrics.

I will change this fallback logic to use math.MaxFloat64 when Max is not provided to ensure missing metrics result in a worst-case score.

@LukeAVanDrie
Copy link
Contributor Author

This PR is a DRAFT as I want to benchmark this on a real cluster still (only have hermetic test validation right now), and I am considering splitting up this change.


// Custom holds custom metrics scraped from the model server.
// The key is the metric name and the value is the metric value.
Custom map[string]float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other data types than float. Let's define something like

type MetricValue struct {
   floatValue float64
   intValue int
   stringValue string
}

// PrometheusMetricPlugin extracts a specific metric from Prometheus format data.
type PrometheusMetricPlugin struct {
typedName fwkplugin.TypedName
spec *Spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a UX perspective, allowing a list of metrics provides better UX

// Produces returns the dynamic metric key this plugin produces.
func (p *PrometheusMetricPlugin) Produces() map[string]any {
return map[string]any{
p.spec.Name: float64(0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debatable but I think the "internal" name of the data/metric should be decoupled from the external metric naming, allowing a unified internal view of the data, despite the external data source (e.g. , different model servers have different metric names)

MetricName string `json:"metricName"`
// Min is the minimum expected value for the metric (used for normalization).
Min float64 `json:"min"`
// Max is the maximum expected value for the metric (used for normalization).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we don't have good values for the min/max, e.g,, what's the max of running requests?

Even if we know the theoretical range, the actual operating state could be within a much smaller range.

I think we need to score relative to the running max/min similar to what we do with the queue scorer. It's not ideal but it removes the UX barrier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do a better job documenting this on the config surface. This is sometimes required for normalization, but not always. E.g., SoftMax uses a dynamic range. For linear, this is needed though.

Running max and min also works though as a default strategy if no value is provided. I would probably send that as a followup PR though.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 2, 2026
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a generic metrics scorer

3 participants