-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Inference API] Record re-routing attributes as part of inference request metrics #122350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference API] Record re-routing attributes as part of inference request metrics #122350
Conversation
| inferenceStats.inferenceDuration().record(timer.elapsedMillis(), responseAttributes(model, unwrapCause(t))); | ||
| Map<String, Object> metricAttributes = new HashMap<>(); | ||
| metricAttributes.putAll(modelAttributes(model)); | ||
| metricAttributes.putAll(routingAttributes(request, localNodeId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m a bit concerned that the metric cardinality will grow extremely rapidly as you add the node_id. I'm not 100% sure if this is a problem for Elasticsearch (overview cluster).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the attribute will have a high cardinality and not the metric, right?
I don't think that should be inherently a problem vs for example Prometheus, which creates a timeseries automatically for each unique metric/attribute pair. We'll only do (manual) aggregations on node id over a limited time window + the number of unique node ids in a (serverless) cluster shouldn't be very high for a timeframe of let's say 10 minutes - 1 day. I've also checked other Elasticsearch metrics and a lot of them include the node id as an attribute, so I guess, if it's not a problem for them it shouldn't be one for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing a slack conversation:
In addition to the cardinality risk during search, there seems to be a risk for high cardinality when the metrics are pushed from the node, where each variation creates a new element that consumes capacity in an outbound queue. The queue has some max capacity and is flushed periodically.
We think (hope) the risk is relatively low, given that nodes have 1 id and the routing ids should have a handful of ids.
|
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this makes sense to me!
Thinking from an AWS Cloudwatch perspective, is there a set of default dimensions (maybe attributes is the correct term here) that we can manipulate in our dashboards? For example, if monitoring an EC2 instance using CloudWatch, we would by default have the ability to split by instance ID. Is there an equivalent sort of thing in our monitoring stack? I poked around the Prometheus docs a bit which you mentioned, and wondering if we have an abstraction layer that provides some of these ES-specific dimensions (or attributes) by default.
Thanks!
After taking a look at some of the metrics in Cloud and at the ES code I assume the APM Java agent and other components in between add some attributes to every metric, but take this with a grain of salt as my source is just me checking metrics in Discover :) Some example attributes I saw on every metric I've checked are:
METERING.md explains a little bit how gathering metrics and traces work in Elasticsearch, maybe that helps to answer your question. |
If I get this alert, what action am I supposed to take? Is it more of an informational of "this code is broken and we need to fix it"? If so, can this be an integration test rather than a metric + alert? My thinking is: alerts are useful when we can take an action on the cluster to resolve it without a code change, like through an administrative API or a configuration change, whereas testing is useful for validation of functionality. If we're monitoring even distribution across the cluster (or alternatively if we're monitoring that we pool as much as possible on one node without overloading it and sending requests to another node when the first one is at capacity), this may be more useful with a saturation metric rather than inserted into the duration metric (like how we monitor threadpool capacity). If we're monitoring to validate that the client (ES) is calling the server (EIS) with as few unique ip:ports as possible, it's probably better to measure this within EIS (VPC flow logs, or check the project-id + ip within the service handler) since this is a requirement of EIS. Otherwise code looks fine :) |
Fair point! I've a meeting later with @wwang500 to discuss how we can write integration test using QAF instead of alerting. Still I would look like to be able to look at some dashboard ideally, especially in the beginning to gauge the behavior (without any alerting).
As the solution is not strictly only for EIS and we plan to expand it for other services I guess we shouldn't make the "telemetry/metrics" specific to EIS IMHO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but treat this as a soft approval since this codebase still feels foreign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, 1 question about stream API
| } | ||
|
|
||
| private static Stream<Map.Entry<String, Object>> modelAttributeEntries(Model model) { | ||
| public static Map<String, Object> modelAttributes(Model model) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why not just create a new HashMap and conditionally put the entries in it?
I think the stream API is useful when we want to declaratively process a collection (filter, transformation, find one element etc), but in this case converting back and forth seems like an overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was originally used as a generator/builder to construct a map from multiple different objects in multiple different functions, but now that this change is moving away from that, we can probably just use HashMap for the conditionals and Map.of where there aren't. We can use Collections.unmodifiableMap around the HashMap if we want to be safe and/or don't trust APM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to adapt it, I'll create a small follow-up PR tomorrow as the CI is green now and we want the metric attributes to appear pretty soon in EC Serverless and EC Hosted, so we can start on the integration tests.
…uest metrics (elastic#122350) Record inference API re-routing attributes as part of request metrics.
Ideally we want to know, whether an inference request was re-routed on the transport layer and which node ultimately performed the inference for the following reasons:
Note: I've also refactored some of the methods in
InferenceStatsto distinguish explicitly between model, routing and response attributes (cc @jonathan-buttner as we discussed it yesterday during a sync)