Skip to content

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Aug 29, 2025

This PR implements some of the improvements from here: #133263

Notably:

  • Response specific threadpool inference_response_thread_pool
  • Allow reuse of persistent connections for connections that use mTLS via clientBuilder.disableConnectionState();
  • Ensuring that the response input stream is closed via EntityUtils.consumeQuietly(response.getEntity());
  • Increasing max_total_connections to 500 and max_route_connections to 200
  • Decreasing xpack.inference.http.retry.initial_delay to 20ms from 1 second
  • Looping in the RequestExecutorService when work was accomplished instead of scheduling a new thread for 0 ms

@jonathan-buttner jonathan-buttner added >bug :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jonathan-buttner, I've created a changelog YAML for you.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review September 4, 2025 15:11
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

public static final Setting<Integer> MAX_ROUTE_CONNECTIONS = Setting.intSetting(
"xpack.inference.http.max_route_connections",
20, // default
200, // default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10x the default value is quite a step. Can we explore changing this with overrides in the environments where EIS is available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing some more research, allowing more connections will results in more memory and file descriptors being used.

Can we explore changing this with overrides in the environments where EIS is available

I suspect that this would mean we'd need to put in a lot of manual overrides. Maybe we leave these defaults as is for now and add metrics to get a better idea of what typical usage looks like.

If the cluster is located in the same region and provider as EIS I typically saw ~20 connections being used after connections already existed in the pool. So when the first spike of traffic occurs it'll likely be limited by the 20 limit here and then hopefully go down after that.

static final Setting<TimeValue> RETRY_INITIAL_DELAY_SETTING = Setting.timeSetting(
"xpack.inference.http.retry.initial_delay",
TimeValue.timeValueSeconds(1),
TimeValue.timeValueMillis(20),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 second is too slow and a bad default value but I don't know what a good default is. 20ms is a very short delay, perhaps 100ms?

My understanding is that the latency was due to the connection pool configuration and retries weren't really happening. It would be good to limit the scope of the changes in this PR if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can switch to 100.

timeToWait = TimeValue.min(endpoint.executeEnqueuedTask(), timeToWait);
}
// if we execute a task the timeToWait will be 0 so we'll immediately look for more work
} while (timeToWait.compareTo(TimeValue.ZERO) <= 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, that was a lot easier then we thought it would be

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonathan-buttner jonathan-buttner merged commit 65dcdf2 into elastic:main Sep 9, 2025
33 checks passed
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
…33861)

* Adding latency improvements

* Update docs/changelog/133861.yaml

* [CI] Auto commit changes from spotless

* Renaming test executor getter and adding response executor

* [CI] Auto commit changes from spotless

* Address feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Sep 9, 2025
…33861)

* Adding latency improvements

* Update docs/changelog/133861.yaml

* [CI] Auto commit changes from spotless

* Renaming test executor getter and adding response executor

* [CI] Auto commit changes from spotless

* Address feedback

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :ml Machine learning Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants