Skip to content

Conversation

jonathan-buttner
Copy link
Contributor

@jonathan-buttner jonathan-buttner commented Aug 29, 2025

This PR disables the rate limiting settings within the inference API for EIS for all task types.

This PR does not change the default inference endpoints that were persisted to the .inference index. Instead, the changes make all default and new EIS inference endpoints have rate limiting disabled.

The changes no longer parse the rate_limit field. If a PUT request includes the rate_limit field we'll throw a validation exception indicating that it shouldn't be included.

The field will be ignored if coming from the .inference index.

Another option would be to return and store:

"rate_limit": {
    "enabled": false
}

If we think that'd be more clear to the user.

I don't think many users are creating EIS endpoints because it isn't documented so maybe this isn't an issue.

Example

PUT 
{
    "service": "elastic",
    "service_settings": {
        "model_id": "elser-2",
        "rate_limit": {
            "requests_per_minute": 200
        }
    }
}

The result will no longer include the rate_limit field since rate limiting is no longer supported.

Response

{
    "inference_id": "test",
    "task_type": "sparse_embedding",
    "service": "elastic",
    "service_settings": {
        "model_id": "elser-2"
    },
    "chunking_settings": {
        "strategy": "sentence",
        "max_chunk_size": 250,
        "sentence_overlap": 1
    }
}

@jonathan-buttner jonathan-buttner added >non-issue :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Aug 29, 2025
@jonathan-buttner jonathan-buttner marked this pull request as ready for review August 29, 2025 20:38
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@DonalEvans DonalEvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing major, just some clean-up suggestions. I do like the idea of returning something to the user to indicate that rate limiting is disabled rather than just not returning any rate limit settings at all.


private static final RateLimitSettings DEFAULT_RATE_LIMIT_SETTINGS = new RateLimitSettings(720L);

public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, the context argument is no longer used, so it can be removed. This also applies to the other *ServiceSettings classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah great point. I'm going to remove them from the *Settings classes but leave them in the models just in case we need the context for a settings class in the future. That way we don't have to plumb it through again.

);
var serviceSettings = ElasticInferenceServiceSparseEmbeddingsServiceSettings.fromMap(map, ConfigurationParseContext.REQUEST);

assertThat(map, is(Map.of()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but the anEmptyMap() matcher is probably a better choice here. There are a few other places in this PR that make a similar assertion which could also be changed.

assertThat(map, is(Map.of()));
assertThat(serviceSettings, is(new ElasticInferenceServiceSparseEmbeddingsServiceSettings(modelId, null)));
assertThat(serviceSettings.rateLimitSettings(), sameInstance(RateLimitSettings.DISABLED_INSTANCE));
assertThat(serviceSettings.rateLimitSettings().isEnabled(), is(false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but this assertion is redundant, since we already confirmed that the rate limit settings returned are RateLimitSettings.DISABLED_INSTANCE. If we want to verify that RateLimitSettings.DISABLED_INSTANCE.isEnabled() returns false, that test would probably be better placed in RateLimitSettingsTests, since it's testing the implementation of the RateLimitSettings class.

There are a few other tests which make similar assertions, namely the ones added in *ServiceSettingsTests classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, and I believe I already have a test for it in RateLimitSettingsTests.


assertThat(xContentResult, is(Strings.format("""
{"model_id":"%s","rate_limit":{"requests_per_minute":1000}}""", modelId)));
assertThat(xContentResult, is(XContentHelper.stripWhitespace(Strings.format("""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the stripWhitespace() needed here? The JSON string doesn't contain any whitespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll format it so it's easier to read and leverages the helper.

assertFalse(serviceSettings.rateLimitSettings().isEnabled());
}

public void testFromMap_RemovesRateLimitingField() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is missing an assertion that the field is removed from the map.


var failureListener = getModelListenerForException(
ElasticsearchStatusException.class,
"Configuration contains settings [{rate_limit={requests_per_minute=100}}] unknown to the [elastic] service"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to have this message be more specific? It's not really accurate to say that rate_limit or requests_per_minute are unknown, they're just disabled in specific cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 👍

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonathan-buttner jonathan-buttner enabled auto-merge (squash) September 3, 2025 15:16
@jonathan-buttner jonathan-buttner merged commit 3718184 into elastic:main Sep 3, 2025
33 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-remove-eis-rl branch September 3, 2025 20:56
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 11, 2025
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c
HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 12, 2025
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c
HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 16, 2025
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c
HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Sep 16, 2025
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c
HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 9, 2025
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c
HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572
Branch=main
phananh1010 added a commit to phananh1010/elasticsearch that referenced this pull request Oct 16, 2025
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c
HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572
Branch=main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >non-issue Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants