-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Wait for allocation on scale up #114719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
|
Hi @davidkyle, I've created a changelog YAML for you. |
...RestTest/java/org/elasticsearch/xpack/ml/integration/AdaptiveAllocationsScaleFromZeroIT.java
Outdated
Show resolved
Hide resolved
...RestTest/java/org/elasticsearch/xpack/ml/integration/AdaptiveAllocationsScaleFromZeroIT.java
Outdated
Show resolved
Hide resolved
...s/src/javaRestTest/java/org/elasticsearch/xpack/ml/integration/PyTorchModelRestTestCase.java
Show resolved
Hide resolved
...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportInternalInferModelAction.java
Outdated
Show resolved
Hide resolved
...plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/InferenceWaitForAllocation.java
Outdated
Show resolved
Hide resolved
...plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/InferenceWaitForAllocation.java
Outdated
Show resolved
Hide resolved
...plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/InferenceWaitForAllocation.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/core/ml/inference/assignment/TrainedModelAssignment.java
Outdated
Show resolved
Hide resolved
...plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/InferenceWaitForAllocation.java
Outdated
Show resolved
Hide resolved
jan-elastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Some some minor comments.
Please have another look at the logging. It looks like there's some leftover debug logging (logged at info).
5f9dddc to
a959890
Compare
| CHUNKING_SETTINGS_ENABLED("es.inference_chunking_settings_feature_flag_enabled=true", Version.fromString("8.16.0"), null), | ||
| INFERENCE_DEFAULT_ELSER("es.inference_default_elser_feature_flag_enabled=true", Version.fromString("8.16.0"), null); | ||
| INFERENCE_DEFAULT_ELSER("es.inference_default_elser_feature_flag_enabled=true", Version.fromString("8.16.0"), null), | ||
| ML_SCALE_FROM_ZERO("es.ml_scale_from_zero_feature_flag_enabled=true", Version.fromString("8.16.0"), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefix with INFERENCE instead of ML.
Very subtle naming btw: we used to have scale_to_zero_feature_flag, and now scale_from_zero_feature_flag :)
jan-elastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes TransportInternalInferModelAction to queue inference requests when an model deployment is scaling up from 0 allocations. Incoming inference requests are stored in a queue then send to the model once it is deployed on a node.
Adds a setting to control the adaptive allocations scale to zero period for the purpose of running tests.