[ML] Disable EIS rate limiting within the inference API #133845

jonathan-buttner · 2025-08-29T17:29:14Z

This PR disables the rate limiting settings within the inference API for EIS for all task types.

This PR does not change the default inference endpoints that were persisted to the .inference index. Instead, the changes make all default and new EIS inference endpoints have rate limiting disabled.

The changes no longer parse the rate_limit field. If a PUT request includes the rate_limit field we'll throw a validation exception indicating that it shouldn't be included.

The field will be ignored if coming from the .inference index.

Another option would be to return and store:

"rate_limit": {
    "enabled": false
}

If we think that'd be more clear to the user.

I don't think many users are creating EIS endpoints because it isn't documented so maybe this isn't an issue.

Example

PUT 
{
    "service": "elastic",
    "service_settings": {
        "model_id": "elser-2",
        "rate_limit": {
            "requests_per_minute": 200
        }
    }
}

The result will no longer include the rate_limit field since rate limiting is no longer supported.

Response

{
    "inference_id": "test",
    "task_type": "sparse_embedding",
    "service": "elastic",
    "service_settings": {
        "model_id": "elser-2"
    },
    "chunking_settings": {
        "strategy": "sentence",
        "max_chunk_size": 250,
        "sentence_overlap": 1
    }
}

…e-eis-rl

elasticsearchmachine · 2025-08-29T20:38:25Z

Pinging @elastic/ml-core (Team:ML)

DonalEvans

Nothing major, just some clean-up suggestions. I do like the idea of returning something to the user to indicate that rate limiting is disabled rather than just not returning any rate limit settings at all.

DonalEvans · 2025-08-29T21:28:56Z

.../inference/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettings.java


-    private static final RateLimitSettings DEFAULT_RATE_LIMIT_SETTINGS = new RateLimitSettings(720L);
-
    public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) {


With these changes, the context argument is no longer used, so it can be removed. This also applies to the other *ServiceSettings classes.

Yeah great point. I'm going to remove them from the *Settings classes but leave them in the models just in case we need the context for a settings class in the future. That way we don't have to plumb it through again.

DonalEvans · 2025-08-29T21:47:52Z

.../inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsServiceSettingsTests.java

+        );
+        var serviceSettings = ElasticInferenceServiceSparseEmbeddingsServiceSettings.fromMap(map, ConfigurationParseContext.REQUEST);
+
+        assertThat(map, is(Map.of()));


Nitpick, but the anEmptyMap() matcher is probably a better choice here. There are a few other places in this PR that make a similar assertion which could also be changed.

DonalEvans · 2025-08-29T21:50:04Z

.../inference/services/elastic/ElasticInferenceServiceSparseEmbeddingsServiceSettingsTests.java

+        assertThat(map, is(Map.of()));
+        assertThat(serviceSettings, is(new ElasticInferenceServiceSparseEmbeddingsServiceSettings(modelId, null)));
+        assertThat(serviceSettings.rateLimitSettings(), sameInstance(RateLimitSettings.DISABLED_INSTANCE));
+        assertThat(serviceSettings.rateLimitSettings().isEnabled(), is(false));


Nitpick, but this assertion is redundant, since we already confirmed that the rate limit settings returned are RateLimitSettings.DISABLED_INSTANCE. If we want to verify that RateLimitSettings.DISABLED_INSTANCE.isEnabled() returns false, that test would probably be better placed in RateLimitSettingsTests, since it's testing the implementation of the RateLimitSettings class.

There are a few other tests which make similar assertions, namely the ones added in *ServiceSettingsTests classes.

Yeah good point, and I believe I already have a test for it in RateLimitSettingsTests.

DonalEvans · 2025-08-29T21:55:05Z

...rence/services/elastic/completion/ElasticInferenceServiceCompletionServiceSettingsTests.java


-        assertThat(xContentResult, is(Strings.format("""
-            {"model_id":"%s","rate_limit":{"requests_per_minute":1000}}""", modelId)));
+        assertThat(xContentResult, is(XContentHelper.stripWhitespace(Strings.format("""


Is the stripWhitespace() needed here? The JSON string doesn't contain any whitespace.

I'll format it so it's easier to read and leverages the helper.

DonalEvans · 2025-08-29T22:00:10Z

...ack/inference/services/elastic/rerank/ElasticInferenceServiceRerankServiceSettingsTests.java

+        assertFalse(serviceSettings.rateLimitSettings().isEnabled());
+    }
+
+    public void testFromMap_RemovesRateLimitingField() {


This test is missing an assertion that the field is removed from the map.

…e-eis-rl

DonalEvans · 2025-09-02T21:17:25Z

...st/java/org/elasticsearch/xpack/inference/services/elastic/ElasticInferenceServiceTests.java

+
+            var failureListener = getModelListenerForException(
+                ElasticsearchStatusException.class,
+                "Configuration contains settings [{rate_limit={requests_per_minute=100}}] unknown to the [elastic] service"


Is there any way to have this message be more specific? It's not really accurate to say that rate_limit or requests_per_minute are unknown, they're just disabled in specific cases.

Good idea 👍

davidkyle

LGTM

…e-eis-rl

…csearch into ml-remove-eis-rl

…e-eis-rl

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main

jonathan-buttner added 4 commits August 19, 2025 17:07

Starting refactor

006bdf2

Merge branch 'main' of github.com:elastic/elasticsearch into ml-remov…

2a7ff64

…e-eis-rl

Not sending enabled field across nodes

fca2543

Adding transport version change

80d4224

jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 29, 2025

jonathan-buttner added 2 commits August 29, 2025 15:14

Merge branch 'main' of github.com:elastic/elasticsearch into ml-remov…

2ca11e2

…e-eis-rl

Removing minimum settings changes

d8b841f

jonathan-buttner marked this pull request as ready for review August 29, 2025 20:38

DonalEvans reviewed Aug 29, 2025

View reviewed changes

jonathan-buttner added 2 commits September 2, 2025 11:18

Addressing feedback

97a3730

Merge branch 'main' of github.com:elastic/elasticsearch into ml-remov…

cee82cf

…e-eis-rl

DonalEvans approved these changes Sep 2, 2025

View reviewed changes

jonathan-buttner and others added 5 commits September 2, 2025 14:38

Rejecting rate limit field

dc7f53b

Ensure parsing from index does not throw

df0a224

Merge branch 'main' of github.com:elastic/elasticsearch into ml-remov…

314b4ba

…e-eis-rl

Adding test to throw when rate limit is in request

910d317

Merge branch 'main' into ml-remove-eis-rl

6fd0c6d

DonalEvans reviewed Sep 2, 2025

View reviewed changes

davidkyle approved these changes Sep 2, 2025

View reviewed changes

jonathan-buttner added 3 commits September 3, 2025 11:14

Returning validation exception for rate limit field

48e88b3

Merge branch 'main' of github.com:elastic/elasticsearch into ml-remov…

a1d75fe

…e-eis-rl

Merge branch 'ml-remove-eis-rl' of github.com:jonathan-buttner/elasti…

04d5699

…csearch into ml-remove-eis-rl

jonathan-buttner enabled auto-merge (squash) September 3, 2025 15:16

Merge branch 'main' of github.com:elastic/elasticsearch into ml-remov…

dfd9154

…e-eis-rl

jonathan-buttner merged commit 3718184 into elastic:main Sep 3, 2025
33 checks passed

jonathan-buttner deleted the ml-remove-eis-rl branch September 3, 2025 20:56

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 11, 2025

Mirror upstream elastic#133845 as single snapshot commit for AI review

2c9a562

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 12, 2025

Mirror upstream elastic#133845 as single snapshot commit for AI review

f0cdfaa

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 16, 2025

Mirror upstream elastic#133845 as single snapshot commit for AI review

00f8817

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 16, 2025

Mirror upstream elastic#133845 as single snapshot commit for AI review

1d9d420

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main

jonathan-buttner mentioned this pull request Oct 2, 2025

[ML] Remove rate limit field from services API for EIS #135838

Merged

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 9, 2025

Mirror upstream elastic#133845 as single snapshot commit for AI review

63993b1

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 16, 2025

Mirror upstream elastic#133845 as single snapshot commit for AI review

c549ee4

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main

phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 24, 2025

Mirror upstream elastic#133845 as single snapshot commit for AI review

85ca09f

BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main


		private static final RateLimitSettings DEFAULT_RATE_LIMIT_SETTINGS = new RateLimitSettings(720L);

		public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) {

[ML] Disable EIS rate limiting within the inference API #133845

[ML] Disable EIS rate limiting within the inference API #133845

Uh oh!

Conversation

jonathan-buttner commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example

Uh oh!

elasticsearchmachine commented Aug 29, 2025

Uh oh!

DonalEvans left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jonathan-buttner commented Aug 29, 2025 •

edited

Loading