-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Measure search load per index #122262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure search load per index #122262
Changes from all commits
9d9b6af
02ddf82
bf79ef3
f1c91bd
ba67bff
2c3654a
58d4762
bc38090
ef0447b
fe009d7
a747a40
f3e47ae
2bc0107
f3b3d00
f52789e
ec243b7
f93eb9b
2777916
623bd7b
3fee6af
af3fff9
a41bbad
8115a61
6b2361e
41dcc1c
2df7f62
6b95b0c
eadf8cf
41dc557
e0e1740
1ba2eaa
6820c35
8280559
e82375e
ce4c1c9
ff92f92
c0f4d18
02ac377
42aa647
bd258ab
0d5b0d3
a6f543a
89a6aca
19b0900
9f1af25
9cedf78
2404c81
a23541b
37dec8b
2ce21d8
28c9677
4851a68
6858dfc
fa3a2fd
088c1e5
9cddba5
c1f0b3d
e50af6f
23ad8c6
6a83b07
7af835a
9f0ec33
0d58081
dab0de9
585d8f7
15311a3
5125276
b8a7294
406af11
d324d5f
fc2b041
206454b
f36ed2a
acc28b5
f8d3ce0
c34cc87
e2c36f8
8398854
037eedd
2a8a7d3
83b8fab
b6a0df7
c33891e
0c64b54
e7ba6ee
66453b4
a345f39
4700d59
f563dcd
a8a5efa
3454904
68c78f2
d8477e9
5c574e4
1e47dde
e9dac7a
936a08d
2cb1681
d823c4f
a8debe3
b2cb778
610b0fe
ff01846
d90c490
bfa9305
6652730
23cc8ca
d38a1f5
202aef4
e044b26
887ec8e
1eb1381
c725c0c
7cfcf5c
76cdebc
a2e879e
a08adf4
66011ec
0188bb9
387caa3
491a91e
c71baf9
2d187be
b13c457
02b19f9
e9002db
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 122262 | ||
| summary: Measure search load per index | ||
| area: Search | ||
| type: feature | ||
| issues: [] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| package org.elasticsearch.cluster.coordination; | ||
|
|
||
| import org.elasticsearch.cluster.ClusterChangedEvent; | ||
| import org.elasticsearch.cluster.ClusterStateListener; | ||
| import org.elasticsearch.index.Index; | ||
| import org.elasticsearch.index.search.stats.ShardSearchPerIndexTimeTrackingMetrics; | ||
|
|
||
| /** | ||
| * Service responsible for cleaning up task execution time tracking for deleted indices. | ||
| * Implements the ClusterStateListener interface to listen for cluster state changes. | ||
| */ | ||
| public class SearchIndexTimeTrackingCleanupService implements ClusterStateListener { | ||
|
|
||
| private ShardSearchPerIndexTimeTrackingMetrics listener; | ||
|
|
||
| /** | ||
| * Constructor. | ||
| * | ||
| * @param listener the listener for shard search time tracking metrics | ||
| */ | ||
| public SearchIndexTimeTrackingCleanupService(ShardSearchPerIndexTimeTrackingMetrics listener) { | ||
| this.listener = listener; | ||
| } | ||
|
|
||
| /** | ||
| * Called when the cluster state changes. Stops tracking execution time for deleted indices. | ||
| * | ||
| * @param event the cluster changed event | ||
| */ | ||
| @Override | ||
| public void clusterChanged(ClusterChangedEvent event) { | ||
| for (Index index : event.indicesDeleted()) { | ||
| listener.stopTrackingIndex(index.getName()); | ||
| } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,146 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the "Elastic License | ||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||
| * Public License v 1"; you may not use this file except in compliance with, at | ||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||
| */ | ||
|
|
||
| package org.elasticsearch.index.search.stats; | ||
|
|
||
| import org.apache.logging.log4j.LogManager; | ||
| import org.apache.logging.log4j.Logger; | ||
| import org.elasticsearch.common.ExponentiallyWeightedMovingAverage; | ||
| import org.elasticsearch.core.Tuple; | ||
| import org.elasticsearch.index.shard.IndexShard; | ||
| import org.elasticsearch.index.shard.SearchOperationListener; | ||
| import org.elasticsearch.search.internal.SearchContext; | ||
|
|
||
| import java.util.concurrent.ConcurrentHashMap; | ||
| import java.util.concurrent.atomic.LongAdder; | ||
|
|
||
| /** | ||
| * This class implements the {@link SearchOperationListener} interface to track the execution time of search operations | ||
| * on a per-index basis. It uses a {@link ConcurrentHashMap} to store the execution times and an | ||
| * {@link ExponentiallyWeightedMovingAverage} to calculate the exponentially weighted moving average (EWMA) of the | ||
| * execution times. | ||
| */ | ||
| public final class ShardSearchPerIndexTimeTrackingMetrics implements SearchOperationListener { | ||
|
|
||
| private static final Logger logger = LogManager.getLogger(ShardSearchPerIndexTimeTrackingMetrics.class); | ||
|
|
||
| private final ConcurrentHashMap<String, Tuple<LongAdder, ExponentiallyWeightedMovingAverage>> indexExecutionTime; | ||
|
|
||
| private final double ewmaAlpha; | ||
|
|
||
| /** | ||
| * Constructs a new ShardSearchPerIndexTimeTrackingMetrics instance with the specified EWMA alpha value. | ||
| * | ||
| * @param ewmaAlpha the alpha value for the EWMA calculation | ||
| */ | ||
| public ShardSearchPerIndexTimeTrackingMetrics(double ewmaAlpha) { | ||
| this.indexExecutionTime = new ConcurrentHashMap<>(); | ||
| this.ewmaAlpha = ewmaAlpha; | ||
| } | ||
|
|
||
| /** | ||
| * Tracks the execution time of the query phase of a search operation. | ||
| * | ||
| * @param searchContext the search context | ||
| * @param tookInNanos the time taken in nanoseconds | ||
| */ | ||
| @Override | ||
| public void onQueryPhase(SearchContext searchContext, long tookInNanos) { | ||
| trackExecutionTime(searchContext, tookInNanos); | ||
| } | ||
|
|
||
| /** | ||
| * Tracks the execution time of a failed query phase of a search operation. | ||
| * | ||
| * @param searchContext the search context | ||
| * @param tookInNanos the time taken in nanoseconds | ||
| */ | ||
| @Override | ||
| public void onFailedQueryPhase(SearchContext searchContext, long tookInNanos) { | ||
| trackExecutionTime(searchContext, tookInNanos); | ||
| } | ||
|
|
||
| /** | ||
| * Tracks the execution time of the fetch phase of a search operation. | ||
| * | ||
| * @param searchContext the search context | ||
| * @param tookInNanos the time taken in nanoseconds | ||
| */ | ||
| @Override | ||
| public void onFetchPhase(SearchContext searchContext, long tookInNanos) { | ||
| trackExecutionTime(searchContext, tookInNanos); | ||
| } | ||
|
|
||
| /** | ||
| * Tracks the execution time of a failed fetch phase of a search operation. | ||
| * | ||
| * @param searchContext the search context | ||
| * @param tookInNanos the time taken in nanoseconds | ||
| */ | ||
| @Override | ||
| public void onFailedFetchPhase(SearchContext searchContext, long tookInNanos) { | ||
| trackExecutionTime(searchContext, tookInNanos); | ||
| } | ||
|
|
||
| /** | ||
| * Tracks the execution time of a search operation. | ||
| * | ||
| * @param searchContext the search context | ||
| * @param tookInNanos the time taken in nanoseconds | ||
| */ | ||
| private void trackExecutionTime(SearchContext searchContext, long tookInNanos) { | ||
| IndexShard indexShard = searchContext.indexShard(); | ||
| if (indexShard != null && indexShard.isSystem() == false) { | ||
| String indexName = indexShard.shardId().getIndexName(); | ||
| if (indexName != null) { | ||
| Tuple<LongAdder, ExponentiallyWeightedMovingAverage> t = indexExecutionTime.computeIfAbsent( | ||
| indexName, | ||
| k -> new Tuple<>(new LongAdder(), new ExponentiallyWeightedMovingAverage(ewmaAlpha, 0)) | ||
| ); | ||
| t.v1().add(tookInNanos); | ||
| t.v2().addValue(tookInNanos); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Gets the total execution time for tasks associated with a specific index. | ||
| * | ||
| * @param indexName the name of the index | ||
| * @return the total execution time for the index | ||
| */ | ||
| public long getSearchLoadPerIndex(String indexName) { | ||
| Tuple<LongAdder, ExponentiallyWeightedMovingAverage> t = indexExecutionTime.get(indexName); | ||
| return (t != null) ? t.v1().sum() : 0; | ||
| } | ||
|
|
||
| /** | ||
| * Gets the exponentially weighted moving average (EWMA) of the execution time for tasks associated with a specific index name. | ||
| * | ||
| * @param indexName the name of the index | ||
| * @return the EWMA of the execution time for the index | ||
| */ | ||
| public double getLoadEMWAPerIndex(String indexName) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe some more context here: |
||
| Tuple<LongAdder, ExponentiallyWeightedMovingAverage> t = indexExecutionTime.get(indexName); | ||
| return (t != null) ? t.v2().getAverage() : 0; | ||
| } | ||
|
|
||
| /** | ||
| * Stops tracking the execution time for tasks associated with a specific index. | ||
| * | ||
| * @param indexName the name of the index | ||
| */ | ||
| public void stopTrackingIndex(String indexName) { | ||
| if (indexExecutionTime.containsKey(indexName)) { | ||
| indexExecutionTime.remove(indexName); | ||
| } else { | ||
| logger.debug("Trying to stop tracking index [{}] that was never tracked", indexName); | ||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,16 +30,17 @@ default void onPreQueryPhase(SearchContext searchContext) {} | |
| /** | ||
| * Executed if a query phased failed. | ||
| * @param searchContext the current search context | ||
| * @param tookInNanos the number of nanoseconds the query execution took | ||
| */ | ||
| default void onFailedQueryPhase(SearchContext searchContext) {} | ||
| default void onFailedQueryPhase(SearchContext searchContext, long tookInNanos) {} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We also need to track the execution time when a phase fails. |
||
|
|
||
| /** | ||
| * Executed after the query phase successfully finished. | ||
| * Note: this is not invoked if the query phase execution failed. | ||
| * @param searchContext the current search context | ||
| * @param tookInNanos the number of nanoseconds the query execution took | ||
| * | ||
| * @see #onFailedQueryPhase(SearchContext) | ||
| * @see #onFailedQueryPhase(SearchContext, long) | ||
| */ | ||
| default void onQueryPhase(SearchContext searchContext, long tookInNanos) {} | ||
|
|
||
|
|
@@ -52,16 +53,17 @@ default void onPreFetchPhase(SearchContext searchContext) {} | |
| /** | ||
| * Executed if a fetch phased failed. | ||
| * @param searchContext the current search context | ||
| * @param tookInNanos the number of nanoseconds the query execution took | ||
| */ | ||
| default void onFailedFetchPhase(SearchContext searchContext) {} | ||
| default void onFailedFetchPhase(SearchContext searchContext, long tookInNanos) {} | ||
|
|
||
| /** | ||
| * Executed after the fetch phase successfully finished. | ||
| * Note: this is not invoked if the fetch phase execution failed. | ||
| * @param searchContext the current search context | ||
| * @param tookInNanos the number of nanoseconds the fetch execution took | ||
| * | ||
| * @see #onFailedFetchPhase(SearchContext) | ||
| * @see #onFailedFetchPhase(SearchContext, long) | ||
| */ | ||
| default void onFetchPhase(SearchContext searchContext, long tookInNanos) {} | ||
|
|
||
|
|
@@ -128,10 +130,10 @@ public void onPreQueryPhase(SearchContext searchContext) { | |
| } | ||
|
|
||
| @Override | ||
| public void onFailedQueryPhase(SearchContext searchContext) { | ||
| public void onFailedQueryPhase(SearchContext searchContext, long tookInNanos) { | ||
| for (SearchOperationListener listener : listeners) { | ||
| try { | ||
| listener.onFailedQueryPhase(searchContext); | ||
| listener.onFailedQueryPhase(searchContext, tookInNanos); | ||
| } catch (Exception e) { | ||
| logger.warn(() -> "onFailedQueryPhase listener [" + listener + "] failed", e); | ||
| } | ||
|
|
@@ -161,10 +163,10 @@ public void onPreFetchPhase(SearchContext searchContext) { | |
| } | ||
|
|
||
| @Override | ||
| public void onFailedFetchPhase(SearchContext searchContext) { | ||
| public void onFailedFetchPhase(SearchContext searchContext, long tookInNanos) { | ||
| for (SearchOperationListener listener : listeners) { | ||
| try { | ||
| listener.onFailedFetchPhase(searchContext); | ||
| listener.onFailedFetchPhase(searchContext, tookInNanos); | ||
| } catch (Exception e) { | ||
| logger.warn(() -> "onFailedFetchPhase listener [" + listener + "] failed", e); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for saying it in such a straightforward manner, but do we really want to add more logic based on this class?
Especially on a per-shard basis, using the math in
ExponentiallyWeightedMovingAverageseems questionable.We calculate
newValue = alpha * lastValue + (1 - alpha) * currentValue. In a large number of use-cases you may see the fetch and query times be an order of magnitude apart. So now, assuming the query always matches, we will essentially flap between two values constantly for a shard?Why not just use the existing metrics we have in
org.elasticsearch.index.search.stats.ShardSearchStats? EWMA makes no sense here, if anything isn't total query time and it's derivative what we care about?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we left it in because we where not sure about it and I asked Dimi to leave a comment on the PR about this.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, let's check how it can be adapted into the
ShardSearchStats