Skip to content

Conversation

@timgrein
Copy link
Contributor

Ideally we want to know, whether an inference request was re-routed on the transport layer and which node ultimately performed the inference for the following reasons:

  • Make sure we really approximate a uniform distribution of requests
  • Be able to setup alerting, if we detect unexpected and potentially faulty distributions

Note: I've also refactored some of the methods in InferenceStats to distinguish explicitly between model, routing and response attributes (cc @jonathan-buttner as we discussed it yesterday during a sync)

@timgrein timgrein changed the title [Inference API] Record inference API re-routing attributes as part of request metrics [Inference API] Record re-routing attributes as part of inference request metrics Feb 12, 2025
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Feb 12, 2025
inferenceStats.inferenceDuration().record(timer.elapsedMillis(), responseAttributes(model, unwrapCause(t)));
Map<String, Object> metricAttributes = new HashMap<>();
metricAttributes.putAll(modelAttributes(model));
metricAttributes.putAll(routingAttributes(request, localNodeId));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit concerned that the metric cardinality will grow extremely rapidly as you add the node_id. I'm not 100% sure if this is a problem for Elasticsearch (overview cluster).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the attribute will have a high cardinality and not the metric, right?

I don't think that should be inherently a problem vs for example Prometheus, which creates a timeseries automatically for each unique metric/attribute pair. We'll only do (manual) aggregations on node id over a limited time window + the number of unique node ids in a (serverless) cluster shouldn't be very high for a timeframe of let's say 10 minutes - 1 day. I've also checked other Elasticsearch metrics and a lot of them include the node id as an attribute, so I guess, if it's not a problem for them it shouldn't be one for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing a slack conversation:
In addition to the cardinality risk during search, there seems to be a risk for high cardinality when the metrics are pushed from the node, where each variation creates a new element that consumes capacity in an outbound queue. The queue has some max capacity and is flushed periodically.

We think (hope) the risk is relatively low, given that nodes have 1 id and the routing ids should have a handful of ids.

@timgrein timgrein added :ml Machine learning Team:ML Meta label for the ML team and removed needs:triage Requires assignment of a team area label labels Feb 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@brendan-jugan-elastic brendan-jugan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this makes sense to me!

Thinking from an AWS Cloudwatch perspective, is there a set of default dimensions (maybe attributes is the correct term here) that we can manipulate in our dashboards? For example, if monitoring an EC2 instance using CloudWatch, we would by default have the ability to split by instance ID. Is there an equivalent sort of thing in our monitoring stack? I poked around the Prometheus docs a bit which you mentioned, and wondering if we have an abstraction layer that provides some of these ES-specific dimensions (or attributes) by default.

Thanks!

@timgrein
Copy link
Contributor Author

Overall, this makes sense to me!

Thinking from an AWS Cloudwatch perspective, is there a set of default dimensions (maybe attributes is the correct term here) that we can manipulate in our dashboards? For example, if monitoring an EC2 instance using CloudWatch, we would by default have the ability to split by instance ID. Is there an equivalent sort of thing in our monitoring stack? I poked around the Prometheus docs a bit which you mentioned, and wondering if we have an abstraction layer that provides some of these ES-specific dimensions (or attributes) by default.

Thanks!

After taking a look at some of the metrics in Cloud and at the ES code I assume the APM Java agent and other components in between add some attributes to every metric, but take this with a grain of salt as my source is just me checking metrics in Discover :) Some example attributes I saw on every metric I've checked are:

  • agent.id
  • agent.type
  • cluster.id
  • etc.

METERING.md explains a little bit how gathering metrics and traces work in Elasticsearch, maybe that helps to answer your question.

@prwhelan
Copy link
Member

prwhelan commented Feb 13, 2025

Make sure we really approximate a uniform distribution of requests
Be able to setup alerting, if we detect unexpected and potentially faulty distributions

If I get this alert, what action am I supposed to take? Is it more of an informational of "this code is broken and we need to fix it"? If so, can this be an integration test rather than a metric + alert?

My thinking is: alerts are useful when we can take an action on the cluster to resolve it without a code change, like through an administrative API or a configuration change, whereas testing is useful for validation of functionality.

If we're monitoring even distribution across the cluster (or alternatively if we're monitoring that we pool as much as possible on one node without overloading it and sending requests to another node when the first one is at capacity), this may be more useful with a saturation metric rather than inserted into the duration metric (like how we monitor threadpool capacity).

If we're monitoring to validate that the client (ES) is calling the server (EIS) with as few unique ip:ports as possible, it's probably better to measure this within EIS (VPC flow logs, or check the project-id + ip within the service handler) since this is a requirement of EIS.

Otherwise code looks fine :)

@timgrein
Copy link
Contributor Author

If I get this alert, what action am I supposed to take? Is it more of an informational of "this code is broken and we need to fix it"? If so, can this be an integration test rather than a metric + alert?

Fair point! I've a meeting later with @wwang500 to discuss how we can write integration test using QAF instead of alerting. Still I would look like to be able to look at some dashboard ideally, especially in the beginning to gauge the behavior (without any alerting).

If we're monitoring to validate that the client (ES) is calling the server (EIS) with as few unique ip:ports as possible, it's probably better to measure this within EIS (VPC flow logs, or check the project-id + ip within the service handler) since this is a requirement of EIS.

As the solution is not strictly only for EIS and we plan to expand it for other services I guess we shouldn't make the "telemetry/metrics" specific to EIS IMHO.

Copy link
Contributor

@maxjakob maxjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but treat this as a soft approval since this codebase still feels foreign

Copy link
Contributor

@demjened demjened left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, 1 question about stream API

}

private static Stream<Map.Entry<String, Object>> modelAttributeEntries(Model model) {
public static Map<String, Object> modelAttributes(Model model) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why not just create a new HashMap and conditionally put the entries in it?

I think the stream API is useful when we want to declaratively process a collection (filter, transformation, find one element etc), but in this case converting back and forth seems like an overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally used as a generator/builder to construct a map from multiple different objects in multiple different functions, but now that this change is moving away from that, we can probably just use HashMap for the conditionals and Map.of where there aren't. We can use Collections.unmodifiableMap around the HashMap if we want to be safe and/or don't trust APM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to adapt it, I'll create a small follow-up PR tomorrow as the CI is green now and we want the metric attributes to appear pretty soon in EC Serverless and EC Hosted, so we can start on the integration tests.

@timgrein timgrein merged commit a3313a3 into elastic:main Feb 20, 2025
17 checks passed
afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Feb 21, 2025
…uest metrics (elastic#122350)

Record inference API re-routing attributes as part of request metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >non-issue Team:ML Meta label for the ML team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants