Skip to content

Conversation

@prwhelan
Copy link
Member

Maintain parsed Inference Endpoints in memory for reuse. Endpoints are cached on first access and expire after write. This removes search pressure during inference, bypassing search requests to system indices for repeated model access. When any endpoint is updated or deleted, the whole cache is invalidated and must be reloaded.

Cache can be configured with three settings:

  • xpack.inference.cache.enabled enables or disables the cache (default enabled).
  • xpack.inference.cache.weight controls how many endpoints can live in the cache (default 25).
  • xpack.inference.cache.expiry_time controls how long endpoints live in the cache, measured from when they are first accessed (default 15 minutes, minimum 1 minute, maximum 1 hour).

Resolve #133135

Maintain parsed Inference Endpoints in memory for reuse. Endpoints are
cached on first access and expire after write. This removes search
pressure during inference, bypassing search requests to system indices
for repeated model access. When any endpoint is updated or deleted, the
whole cache is invalidated and must be reloaded.

Cache can be configured with three settings:
- `xpack.inference.cache.enabled` enables or disables the cache (default
  enabled).
- `xpack.inference.cache.weight` controls how many endpoints can live in
  the cache (default 25).
- `xpack.inference.cache.expiry_time` controls how long endpoints live
  in the cache, measured from when they are first accessed (default 15
  minutes, minimum 1 minute, maximum 1 hour).

Resolve elastic#133135
@prwhelan prwhelan added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 29, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @prwhelan, I've created a changelog YAML for you.

@prwhelan prwhelan marked this pull request as ready for review September 3, 2025 13:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just left a few questions

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Project Aware!!!! nice. Looks good I'll take another pass at it later

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This is a really good change one I've wanted to make for a while

@prwhelan prwhelan enabled auto-merge (squash) September 10, 2025 15:37
@prwhelan prwhelan merged commit 422db0d into elastic:main Sep 10, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :ml Machine learning Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ML] Cache Inference Endpoints

4 participants