-
Notifications
You must be signed in to change notification settings - Fork 235
feat: generic metrics scorer and prometheus extractor #2237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: generic metrics scorer and prometheus extractor #2237
Conversation
|
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: LukeAVanDrie The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This perhaps belongs in ./docs/... instead of ./site-src/.... I think this is more useful for project maintainers than users. Or, maybe this is not helpful enough to warrant the future maintenance burden. If so, I can remove this and keep this in my personal notes.
Introduces the Generic Metrics Scorer feature, enabling EPP to schedule pods based on arbitrary Prometheus metrics (e.g. `vllm:num_requests_running`) via configuration. Key components: - `prometheus-metric` Extractor: Implements Series Selection (label matching) to parse raw metrics. - `metric-scorer` Plugin: Supports Minimize (Softmin)/Maximize modes with Linear/Softmax normalization. - Documentation: Added comprehensive guide for Custom Metrics configuration.
151fd02 to
0992c12
Compare
| logger.Info("Missing custom metric for endpoint", "endpoint", endpoint.GetMetadata().NamespacedName, "metric", s.config.MetricName) | ||
| // Apply worst-case value for missing metrics. | ||
| if s.config.OptimizationMode == OptimizationModeMinimize { | ||
| val = maxVal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user configures Softmax + Minimize (the default) but omits the max parameter (since Softmax doesn't mathematically require a range), s.config.Max defaults to 0.0.
A pod with missing metrics will be assigned a value of 0.0. If the metric is something like running requests, a real value might be 50. Since we are minimizing, the 0.0 (missing) is seen as better than 50 (healthy), causing the scheduler to funnel traffic to pods that are failing to report metrics.
I will change this fallback logic to use math.MaxFloat64 when Max is not provided to ensure missing metrics result in a worst-case score.
|
This PR is a DRAFT as I want to benchmark this on a real cluster still (only have hermetic test validation right now), and I am considering splitting up this change. |
|
|
||
| // Custom holds custom metrics scraped from the model server. | ||
| // The key is the metric name and the value is the metric value. | ||
| Custom map[string]float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other data types than float. Let's define something like
type MetricValue struct {
floatValue float64
intValue int
stringValue string
}
| // PrometheusMetricPlugin extracts a specific metric from Prometheus format data. | ||
| type PrometheusMetricPlugin struct { | ||
| typedName fwkplugin.TypedName | ||
| spec *Spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a UX perspective, allowing a list of metrics provides better UX
| // Produces returns the dynamic metric key this plugin produces. | ||
| func (p *PrometheusMetricPlugin) Produces() map[string]any { | ||
| return map[string]any{ | ||
| p.spec.Name: float64(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debatable but I think the "internal" name of the data/metric should be decoupled from the external metric naming, allowing a unified internal view of the data, despite the external data source (e.g. , different model servers have different metric names)
| MetricName string `json:"metricName"` | ||
| // Min is the minimum expected value for the metric (used for normalization). | ||
| Min float64 `json:"min"` | ||
| // Max is the maximum expected value for the metric (used for normalization). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we don't have good values for the min/max, e.g,, what's the max of running requests?
Even if we know the theoretical range, the actual operating state could be within a much smaller range.
I think we need to score relative to the running max/min similar to what we do with the queue scorer. It's not ideal but it removes the UX barrier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to do a better job documenting this on the config surface. This is sometimes required for normalization, but not always. E.g., SoftMax uses a dynamic range. For linear, this is needed though.
Running max and min also works though as a default strategy if no value is provided. I would probably send that as a followup PR though.
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Introduces a generic metrics scorer feature, enabling the EPP to schedule pods based on arbitrary Prometheus metrics (e.g.,
vllm:num_requests_running) without code modifications.This PR implements a "Source -> Extractor -> Consumer" architecture:
metrics-data-source(Existing plugin, fetches raw Prometheus text).prometheus-metric(New plugin).metric-scorer(New plugin).LinearandSoftmax(distribution-aware) algorithms.metric-scorer(New plugin).LinearvsSoftmaxfrom optimization mode (MinimizevsMaximize).[(val - min) / (max - min)]).Minimize, effectively implements Softmin (exp(-x)).Which issue(s) this PR fixes:
Fixes #2201
Does this PR introduce a user-facing change?: