[Inference API] Record re-routing attributes as part of inference request metrics #122350

timgrein · 2025-02-12T10:23:04Z

Ideally we want to know, whether an inference request was re-routed on the transport layer and which node ultimately performed the inference for the following reasons:

Make sure we really approximate a uniform distribution of requests
Be able to setup alerting, if we detect unexpected and potentially faulty distributions

Note: I've also refactored some of the methods in InferenceStats to distinguish explicitly between model, routing and response attributes (cc @jonathan-buttner as we discussed it yesterday during a sync)

vidok · 2025-02-12T11:35:24Z

...nce/src/main/java/org/elasticsearch/xpack/inference/action/BaseTransportInferenceAction.java

-            inferenceStats.inferenceDuration().record(timer.elapsedMillis(), responseAttributes(model, unwrapCause(t)));
+            Map<String, Object> metricAttributes = new HashMap<>();
+            metricAttributes.putAll(modelAttributes(model));
+            metricAttributes.putAll(routingAttributes(request, localNodeId));


I’m a bit concerned that the metric cardinality will grow extremely rapidly as you add the node_id. I'm not 100% sure if this is a problem for Elasticsearch (overview cluster).

I think the attribute will have a high cardinality and not the metric, right?

I don't think that should be inherently a problem vs for example Prometheus, which creates a timeseries automatically for each unique metric/attribute pair. We'll only do (manual) aggregations on node id over a limited time window + the number of unique node ids in a (serverless) cluster shouldn't be very high for a timeframe of let's say 10 minutes - 1 day. I've also checked other Elasticsearch metrics and a lot of them include the node id as an attribute, so I guess, if it's not a problem for them it shouldn't be one for us.

Summarizing a slack conversation:
In addition to the cardinality risk during search, there seems to be a risk for high cardinality when the metrics are pushed from the node, where each variation creates a new element that consumes capacity in an outbound queue. The queue has some max capacity and is flushed periodically.

We think (hope) the risk is relatively low, given that nodes have 1 id and the routing ids should have a handful of ids.

elasticsearchmachine · 2025-02-12T13:31:40Z

Pinging @elastic/ml-core (Team:ML)

brendan-jugan-elastic

Overall, this makes sense to me!

Thinking from an AWS Cloudwatch perspective, is there a set of default dimensions (maybe attributes is the correct term here) that we can manipulate in our dashboards? For example, if monitoring an EC2 instance using CloudWatch, we would by default have the ability to split by instance ID. Is there an equivalent sort of thing in our monitoring stack? I poked around the Prometheus docs a bit which you mentioned, and wondering if we have an abstraction layer that provides some of these ES-specific dimensions (or attributes) by default.

Thanks!

timgrein · 2025-02-13T09:22:36Z

Overall, this makes sense to me!

Thinking from an AWS Cloudwatch perspective, is there a set of default dimensions (maybe attributes is the correct term here) that we can manipulate in our dashboards? For example, if monitoring an EC2 instance using CloudWatch, we would by default have the ability to split by instance ID. Is there an equivalent sort of thing in our monitoring stack? I poked around the Prometheus docs a bit which you mentioned, and wondering if we have an abstraction layer that provides some of these ES-specific dimensions (or attributes) by default.

Thanks!

After taking a look at some of the metrics in Cloud and at the ES code I assume the APM Java agent and other components in between add some attributes to every metric, but take this with a grain of salt as my source is just me checking metrics in Discover :) Some example attributes I saw on every metric I've checked are:

agent.id
agent.type
cluster.id
etc.

METERING.md explains a little bit how gathering metrics and traces work in Elasticsearch, maybe that helps to answer your question.

prwhelan · 2025-02-13T14:05:09Z

Make sure we really approximate a uniform distribution of requests
Be able to setup alerting, if we detect unexpected and potentially faulty distributions

If I get this alert, what action am I supposed to take? Is it more of an informational of "this code is broken and we need to fix it"? If so, can this be an integration test rather than a metric + alert?

My thinking is: alerts are useful when we can take an action on the cluster to resolve it without a code change, like through an administrative API or a configuration change, whereas testing is useful for validation of functionality.

If we're monitoring even distribution across the cluster (or alternatively if we're monitoring that we pool as much as possible on one node without overloading it and sending requests to another node when the first one is at capacity), this may be more useful with a saturation metric rather than inserted into the duration metric (like how we monitor threadpool capacity).

If we're monitoring to validate that the client (ES) is calling the server (EIS) with as few unique ip:ports as possible, it's probably better to measure this within EIS (VPC flow logs, or check the project-id + ip within the service handler) since this is a requirement of EIS.

Otherwise code looks fine :)

timgrein · 2025-02-20T14:07:40Z

If I get this alert, what action am I supposed to take? Is it more of an informational of "this code is broken and we need to fix it"? If so, can this be an integration test rather than a metric + alert?

Fair point! I've a meeting later with @wwang500 to discuss how we can write integration test using QAF instead of alerting. Still I would look like to be able to look at some dashboard ideally, especially in the beginning to gauge the behavior (without any alerting).

If we're monitoring to validate that the client (ES) is calling the server (EIS) with as few unique ip:ports as possible, it's probably better to measure this within EIS (VPC flow logs, or check the project-id + ip within the service handler) since this is a requirement of EIS.

As the solution is not strictly only for EIS and we plan to expand it for other services I guess we shouldn't make the "telemetry/metrics" specific to EIS IMHO.

maxjakob

LGTM but treat this as a soft approval since this codebase still feels foreign

demjened

LGTM, 1 question about stream API

demjened · 2025-02-20T16:21:27Z

...ugin/inference/src/main/java/org/elasticsearch/xpack/inference/telemetry/InferenceStats.java

    }

-    private static Stream<Map.Entry<String, Object>> modelAttributeEntries(Model model) {
+    public static Map<String, Object> modelAttributes(Model model) {


Out of curiosity, why not just create a new HashMap and conditionally put the entries in it?

I think the stream API is useful when we want to declaratively process a collection (filter, transformation, find one element etc), but in this case converting back and forth seems like an overhead.

It was originally used as a generator/builder to construct a map from multiple different objects in multiple different functions, but now that this change is moving away from that, we can probably just use HashMap for the conditionals and Map.of where there aren't. We can use Collections.unmodifiableMap around the HashMap if we want to be safe and/or don't trust APM

Makes sense to adapt it, I'll create a small follow-up PR tomorrow as the CI is green now and we want the metric attributes to appear pretty soon in EC Serverless and EC Hosted, so we can start on the integration tests.

…uest metrics (elastic#122350) Record inference API re-routing attributes as part of request metrics.

Record inference API re-routing attributes as part of request metrics

528d676

timgrein added >non-issue v8.19.0 v9.1.0 labels Feb 12, 2025

timgrein changed the title ~~[Inference API] Record inference API re-routing attributes as part of request metrics~~ [Inference API] Record re-routing attributes as part of inference request metrics Feb 12, 2025

Merge branch 'main' into inference-api-rate-limiting-telemetry

0c1a171

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 12, 2025

vidok reviewed Feb 12, 2025

View reviewed changes

timgrein added :ml Machine learning Team:ML Meta label for the ML team and removed needs:triage Requires assignment of a team area label labels Feb 12, 2025

timgrein requested a review from brendan-jugan-elastic February 12, 2025 13:31

timgrein and others added 2 commits February 12, 2025 17:53

Merge branch 'main' into inference-api-rate-limiting-telemetry

02de5c3

Add labels also to inference request count metric

5fc9bfd

brendan-jugan-elastic reviewed Feb 13, 2025

View reviewed changes

jonathan-buttner requested a review from prwhelan February 13, 2025 13:28

timgrein requested review from brendan-jugan-elastic and vidok February 20, 2025 15:17

Merge branch 'main' into inference-api-rate-limiting-telemetry

22bfdff

maxjakob approved these changes Feb 20, 2025

View reviewed changes

demjened approved these changes Feb 20, 2025

View reviewed changes

prwhelan approved these changes Feb 20, 2025

View reviewed changes

timgrein merged commit a3313a3 into elastic:main Feb 20, 2025
17 checks passed

timgrein mentioned this pull request Feb 21, 2025

[Inference API] Follow-up for add rerouting attributes PR comment #123121

Merged

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Feb 21, 2025

[Inference API] Record re-routing attributes as part of inference req…

9b29009

…uest metrics (elastic#122350) Record inference API re-routing attributes as part of request metrics.

[Inference API] Record re-routing attributes as part of inference request metrics #122350

[Inference API] Record re-routing attributes as part of inference request metrics #122350

Uh oh!

Conversation

timgrein commented Feb 12, 2025

Uh oh!

vidok Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

timgrein Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

prwhelan Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Feb 12, 2025

Uh oh!

brendan-jugan-elastic left a comment

Choose a reason for hiding this comment

Uh oh!

timgrein commented Feb 13, 2025

Uh oh!

prwhelan commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timgrein commented Feb 20, 2025

Uh oh!

maxjakob left a comment

Choose a reason for hiding this comment

Uh oh!

demjened left a comment

Choose a reason for hiding this comment

Uh oh!

demjened Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

prwhelan Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

timgrein Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

prwhelan commented Feb 13, 2025 •

edited

Loading