-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Disable EIS rate limiting within the inference API #133845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Disable EIS rate limiting within the inference API #133845
Conversation
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing major, just some clean-up suggestions. I do like the idea of returning something to the user to indicate that rate limiting is disabled rather than just not returning any rate limit settings at all.
|
||
private static final RateLimitSettings DEFAULT_RATE_LIMIT_SETTINGS = new RateLimitSettings(720L); | ||
|
||
public static ElasticInferenceServiceCompletionServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes, the context
argument is no longer used, so it can be removed. This also applies to the other *ServiceSettings
classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah great point. I'm going to remove them from the *Settings
classes but leave them in the models just in case we need the context for a settings class in the future. That way we don't have to plumb it through again.
); | ||
var serviceSettings = ElasticInferenceServiceSparseEmbeddingsServiceSettings.fromMap(map, ConfigurationParseContext.REQUEST); | ||
|
||
assertThat(map, is(Map.of())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but the anEmptyMap()
matcher is probably a better choice here. There are a few other places in this PR that make a similar assertion which could also be changed.
assertThat(map, is(Map.of())); | ||
assertThat(serviceSettings, is(new ElasticInferenceServiceSparseEmbeddingsServiceSettings(modelId, null))); | ||
assertThat(serviceSettings.rateLimitSettings(), sameInstance(RateLimitSettings.DISABLED_INSTANCE)); | ||
assertThat(serviceSettings.rateLimitSettings().isEnabled(), is(false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but this assertion is redundant, since we already confirmed that the rate limit settings returned are RateLimitSettings.DISABLED_INSTANCE
. If we want to verify that RateLimitSettings.DISABLED_INSTANCE.isEnabled()
returns false
, that test would probably be better placed in RateLimitSettingsTests
, since it's testing the implementation of the RateLimitSettings
class.
There are a few other tests which make similar assertions, namely the ones added in *ServiceSettingsTests
classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point, and I believe I already have a test for it in RateLimitSettingsTests
.
|
||
assertThat(xContentResult, is(Strings.format(""" | ||
{"model_id":"%s","rate_limit":{"requests_per_minute":1000}}""", modelId))); | ||
assertThat(xContentResult, is(XContentHelper.stripWhitespace(Strings.format(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the stripWhitespace()
needed here? The JSON string doesn't contain any whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll format it so it's easier to read and leverages the helper.
assertFalse(serviceSettings.rateLimitSettings().isEnabled()); | ||
} | ||
|
||
public void testFromMap_RemovesRateLimitingField() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is missing an assertion that the field is removed from the map.
|
||
var failureListener = getModelListenerForException( | ||
ElasticsearchStatusException.class, | ||
"Configuration contains settings [{rate_limit={requests_per_minute=100}}] unknown to the [elastic] service" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to have this message be more specific? It's not really accurate to say that rate_limit
or requests_per_minute
are unknown, they're just disabled in specific cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
BASE=ae7bfd61c966081bb68dda052f4fb7bf81abbb2c HEAD=dfd9154dcdf6d90333625e1542220dce5faf4572 Branch=main
This PR disables the rate limiting settings within the inference API for EIS for all task types.
This PR does not change the default inference endpoints that were persisted to the
.inference
index. Instead, the changes make all default and new EIS inference endpoints have rate limiting disabled.The changes no longer parse the
rate_limit
field. If a PUT request includes therate_limit
field we'll throw a validation exception indicating that it shouldn't be included.The field will be ignored if coming from the
.inference
index.Another option would be to return and store:
If we think that'd be more clear to the user.
I don't think many users are creating EIS endpoints because it isn't documented so maybe this isn't an issue.
Example
The result will no longer include the
rate_limit
field since rate limiting is no longer supported.Response