[ML] Cache Inference Endpoints #133860

prwhelan · 2025-08-29T20:50:30Z

Maintain parsed Inference Endpoints in memory for reuse. Endpoints are cached on first access and expire after write. This removes search pressure during inference, bypassing search requests to system indices for repeated model access. When any endpoint is updated or deleted, the whole cache is invalidated and must be reloaded.

Cache can be configured with three settings:

xpack.inference.cache.enabled enables or disables the cache (default enabled).
xpack.inference.cache.weight controls how many endpoints can live in the cache (default 25).
xpack.inference.cache.expiry_time controls how long endpoints live in the cache, measured from when they are first accessed (default 15 minutes, minimum 1 minute, maximum 1 hour).

Resolve #133135

Maintain parsed Inference Endpoints in memory for reuse. Endpoints are cached on first access and expire after write. This removes search pressure during inference, bypassing search requests to system indices for repeated model access. When any endpoint is updated or deleted, the whole cache is invalidated and must be reloaded. Cache can be configured with three settings: - `xpack.inference.cache.enabled` enables or disables the cache (default enabled). - `xpack.inference.cache.weight` controls how many endpoints can live in the cache (default 25). - `xpack.inference.cache.expiry_time` controls how long endpoints live in the cache, measured from when they are first accessed (default 15 minutes, minimum 1 minute, maximum 1 hour). Resolve elastic#133135

elasticsearchmachine · 2025-08-29T20:50:54Z

Hi @prwhelan, I've created a changelog YAML for you.

…135-2-1

elasticsearchmachine · 2025-09-03T13:40:02Z

Pinging @elastic/ml-core (Team:ML)

jonathan-buttner

Looks good just left a few questions

.../main/java/org/elasticsearch/xpack/inference/registry/ClearInferenceEndpointCacheAction.java

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/SerializableStats.java

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java

...ence/src/main/java/org/elasticsearch/xpack/inference/registry/InferenceEndpointRegistry.java

...src/test/java/org/elasticsearch/xpack/inference/registry/InferenceEndpointRegistryTests.java

.../main/java/org/elasticsearch/xpack/inference/registry/ClearInferenceEndpointCacheAction.java

davidkyle

Project Aware!!!! nice. Looks good I'll take another pass at it later

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/inference/SerializableStats.java

x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java

...ence/src/main/java/org/elasticsearch/xpack/inference/registry/InferenceEndpointRegistry.java

…135-2-1

davidkyle

LGTM

This is a really good change one I've wanted to make for a while

...c/main/java/org/elasticsearch/xpack/core/inference/action/GetInferenceDiagnosticsAction.java

...ce-tests/src/javaRestTest/java/org/elasticsearch/xpack/inference/CreateFromDeploymentIT.java

.../main/java/org/elasticsearch/xpack/inference/registry/ClearInferenceEndpointCacheAction.java

prwhelan added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 29, 2025

prwhelan and others added 11 commits August 29, 2025 16:50

Update docs/changelog/133860.yaml

6e38b32

Update transport version

85ba05a

fix tests; add permissions

99a3b14

use BWC

6d405ea

Merge branch 'main' of github.com:prwhelan/elasticsearch into fix/133…

c5c1bfa

…135-2-1

Merge branch 'main' of github.com:prwhelan/elasticsearch into fix/133…

be4b635

…135-2-1

Add writeable entry to ML tests

a1c3cdc

[CI] Auto commit changes from spotless

e0d442d

Merge branch 'main' into fix/133135-2-1

6459d80

Merge branch 'main' of github.com:prwhelan/elasticsearch into fix/133…

2b8f2a9

…135-2-1

Merge branch 'main' into fix/133135-2-1

91ff83b

prwhelan marked this pull request as ready for review September 3, 2025 13:39

jonathan-buttner reviewed Sep 4, 2025

View reviewed changes

davidkyle reviewed Sep 8, 2025

View reviewed changes

prwhelan added 3 commits September 9, 2025 10:34

address comments

97da396

Merge branch 'main' of github.com:prwhelan/elasticsearch into fix/133…

463089a

…135-2-1

Merge branch 'main' into fix/133135-2-1

378b65a

davidkyle approved these changes Sep 9, 2025

View reviewed changes

jonathan-buttner approved these changes Sep 9, 2025

View reviewed changes

prwhelan added 3 commits September 10, 2025 11:21

Address comments

2b1ca15

Update javadoc with edge cases

86c366d

Merge branch 'main' into fix/133135-2-1

111db1e

prwhelan enabled auto-merge (squash) September 10, 2025 15:37

prwhelan merged commit 422db0d into elastic:main Sep 10, 2025
34 checks passed

tlrx mentioned this pull request Sep 16, 2025

[ML] ClearInferenceEndpointCacheAction breaks rolling upragde tests #134809

Closed

[ML] Cache Inference Endpoints #133860

[ML] Cache Inference Endpoints #133860

Uh oh!

Conversation

prwhelan commented Aug 29, 2025

Uh oh!

elasticsearchmachine commented Aug 29, 2025

Uh oh!

elasticsearchmachine commented Sep 3, 2025

Uh oh!

jonathan-buttner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants