-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Implementing latency improvements for EIS integration #133861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Implementing latency improvements for EIS integration #133861
Conversation
Hi @jonathan-buttner, I've created a changelog YAML for you. |
…ticsearch into ml-improve-latency
Pinging @elastic/ml-core (Team:ML) |
...c/main/java/org/elasticsearch/xpack/core/inference/action/GetInferenceDiagnosticsAction.java
Show resolved
Hide resolved
...lasticsearch/xpack/core/inference/action/GetInferenceDiagnosticsActionNodeResponseTests.java
Outdated
Show resolved
Hide resolved
...ugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/http/HttpClient.java
Outdated
Show resolved
Hide resolved
...ugin/inference/src/main/java/org/elasticsearch/xpack/inference/external/http/HttpClient.java
Outdated
Show resolved
Hide resolved
public static final Setting<Integer> MAX_ROUTE_CONNECTIONS = Setting.intSetting( | ||
"xpack.inference.http.max_route_connections", | ||
20, // default | ||
200, // default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10x the default value is quite a step. Can we explore changing this with overrides in the environments where EIS is available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After doing some more research, allowing more connections will results in more memory and file descriptors being used.
Can we explore changing this with overrides in the environments where EIS is available
I suspect that this would mean we'd need to put in a lot of manual overrides. Maybe we leave these defaults as is for now and add metrics to get a better idea of what typical usage looks like.
If the cluster is located in the same region and provider as EIS I typically saw ~20 connections being used after connections already existed in the pool. So when the first spike of traffic occurs it'll likely be limited by the 20 limit here and then hopefully go down after that.
static final Setting<TimeValue> RETRY_INITIAL_DELAY_SETTING = Setting.timeSetting( | ||
"xpack.inference.http.retry.initial_delay", | ||
TimeValue.timeValueSeconds(1), | ||
TimeValue.timeValueMillis(20), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 second is too slow and a bad default value but I don't know what a good default is. 20ms is a very short delay, perhaps 100ms?
My understanding is that the latency was due to the connection pool configuration and retries weren't really happening. It would be good to limit the scope of the changes in this PR if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can switch to 100.
timeToWait = TimeValue.min(endpoint.executeEnqueuedTask(), timeToWait); | ||
} | ||
// if we execute a task the timeToWait will be 0 so we'll immediately look for more work | ||
} while (timeToWait.compareTo(TimeValue.ZERO) <= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, that was a lot easier then we thought it would be
…ticsearch into ml-improve-latency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…33861) * Adding latency improvements * Update docs/changelog/133861.yaml * [CI] Auto commit changes from spotless * Renaming test executor getter and adding response executor * [CI] Auto commit changes from spotless * Address feedback --------- Co-authored-by: elasticsearchmachine <[email protected]>
…33861) * Adding latency improvements * Update docs/changelog/133861.yaml * [CI] Auto commit changes from spotless * Renaming test executor getter and adding response executor * [CI] Auto commit changes from spotless * Address feedback --------- Co-authored-by: elasticsearchmachine <[email protected]>
This PR implements some of the improvements from here: #133263
Notably:
inference_response_thread_pool
clientBuilder.disableConnectionState();
EntityUtils.consumeQuietly(response.getEntity());
max_total_connections
to 500 andmax_route_connections
to 200xpack.inference.http.retry.initial_delay
to 20ms from 1 secondRequestExecutorService
when work was accomplished instead of scheduling a new thread for 0 ms