Adaptive allocations: scale to zero allocations #113455

jan-elastic · 2024-09-24T12:37:40Z

No description provided.

elasticsearchmachine · 2024-09-24T12:38:05Z

Pinging @elastic/ml-core (Team:ML)

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java

...in/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportInternalInferModelAction.java

...src/main/java/org/elasticsearch/xpack/core/ml/action/UpdateTrainedModelDeploymentAction.java

...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java

davidkyle

looks good, just wondering how to test this

...java/org/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScaler.java

...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java

Co-authored-by: David Kyle <[email protected]>

davidkyle

One comment, otherwise LGTM

...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java

Co-authored-by: David Kyle <[email protected]>

## Summary #### Notes for reviewers elastic/elasticsearch#113455 PR has to be merged first to support `min_number_of_allocation: 0`. At the moment it's not possible to start a `Low` vCPUs usage deployment from the UI. Resolves #189975 - Updates the Start/Update model deployment dialogue, replacing allocation and threading params with use cases and advanced settings with vCPUs/VCUs controls. The vCPUs (or VCUs for serverless env) are derived from the number of allocations times threads per allocation. _Optimised for ingest_ sets 1 thread per allocation. _Optimised for search_ for low vCPUs level sets 2 threads per allocation, and the maximum available number of threads for medium and high. This value is limited to the `max_single_ml_node_processors`. vCPUs control acts differently depending on "Adaptive resources". - When On, it enables `adaptive_allocations` and sets the range `min_number_of_allocations` and `max_number_of_allocations` based on the use-case (threads per allocation) and cluster configuration. - For cloud deployments with enabled ML autoscaling, vCPUs ranges are 1 -2 for, 2-32, 33-max - For on-prem deployments and cloud deployments with disabled autoscaling, vCPUs ranges are based on the hardware limitations. E.g. with `total_ml_processors` = 32, and `max_single_ml_node_processors` = 16 ranges are 1-2, 3-16, 17-32. - When Off, it sets a static `number_of_allocations`. The number of allocations is an upper bound of the ranges mentioned above. For serverless deployments adaptive resources are enabled at all times, and control is hidden. <img width="795" alt="image" src="https://github.com/user-attachments/assets/20b2528f-b631-49f9-82f8-fef6175873fd"> - Start deployment modal checks if there is a deployment optimized for a certain use case, and sets another use case by default. E.g. If an optimized deployment for a certain use case (e.g., ingestion) is found, the modal will set a different use case (e.g., search) as the default upon opening. - For the cloud deployments, also display a message about vCPU values in the Cloud Console and a link to edit the deployment: <img width="791" alt="image" src="https://github.com/user-attachments/assets/2f98ebca-579e-43c1-ab78-e0dd38ce4786"> - For Serverless, the "Adaptive Resources" control is hidden, as adaptive allocations are enabled at all times. <img width="659" alt="image" src="https://github.com/user-attachments/assets/8133ebf9-6b2b-4fea-95f1-7351cfcf85b6"> - Update action was updated accordingly. As it is not possible to update threads per allocation, the optimize control is disabled. Advanced settings are expanded by default and allows the user to adjust vCPUs level and enable/disable adaptive resources. <img width="785" alt="image" src="https://github.com/user-attachments/assets/13c3f0bc-4436-4528-8641-d33eb5384ea2"> - Indicate if adaptive allocations are enabled in the expanded row <img width="1322" alt="image" src="https://github.com/user-attachments/assets/894916df-4c77-4e75-b175-229131b8ecc8"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [x] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [x] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: István Zoltán Szabó <[email protected]>

## Summary #### Notes for reviewers elastic/elasticsearch#113455 PR has to be merged first to support `min_number_of_allocation: 0`. At the moment it's not possible to start a `Low` vCPUs usage deployment from the UI. Resolves elastic#189975 - Updates the Start/Update model deployment dialogue, replacing allocation and threading params with use cases and advanced settings with vCPUs/VCUs controls. The vCPUs (or VCUs for serverless env) are derived from the number of allocations times threads per allocation. _Optimised for ingest_ sets 1 thread per allocation. _Optimised for search_ for low vCPUs level sets 2 threads per allocation, and the maximum available number of threads for medium and high. This value is limited to the `max_single_ml_node_processors`. vCPUs control acts differently depending on "Adaptive resources". - When On, it enables `adaptive_allocations` and sets the range `min_number_of_allocations` and `max_number_of_allocations` based on the use-case (threads per allocation) and cluster configuration. - For cloud deployments with enabled ML autoscaling, vCPUs ranges are 1 -2 for, 2-32, 33-max - For on-prem deployments and cloud deployments with disabled autoscaling, vCPUs ranges are based on the hardware limitations. E.g. with `total_ml_processors` = 32, and `max_single_ml_node_processors` = 16 ranges are 1-2, 3-16, 17-32. - When Off, it sets a static `number_of_allocations`. The number of allocations is an upper bound of the ranges mentioned above. For serverless deployments adaptive resources are enabled at all times, and control is hidden. <img width="795" alt="image" src="https://github.com/user-attachments/assets/20b2528f-b631-49f9-82f8-fef6175873fd"> - Start deployment modal checks if there is a deployment optimized for a certain use case, and sets another use case by default. E.g. If an optimized deployment for a certain use case (e.g., ingestion) is found, the modal will set a different use case (e.g., search) as the default upon opening. - For the cloud deployments, also display a message about vCPU values in the Cloud Console and a link to edit the deployment: <img width="791" alt="image" src="https://github.com/user-attachments/assets/2f98ebca-579e-43c1-ab78-e0dd38ce4786"> - For Serverless, the "Adaptive Resources" control is hidden, as adaptive allocations are enabled at all times. <img width="659" alt="image" src="https://github.com/user-attachments/assets/8133ebf9-6b2b-4fea-95f1-7351cfcf85b6"> - Update action was updated accordingly. As it is not possible to update threads per allocation, the optimize control is disabled. Advanced settings are expanded by default and allows the user to adjust vCPUs level and enable/disable adaptive resources. <img width="785" alt="image" src="https://github.com/user-attachments/assets/13c3f0bc-4436-4528-8641-d33eb5384ea2"> - Indicate if adaptive allocations are enabled in the expanded row <img width="1322" alt="image" src="https://github.com/user-attachments/assets/894916df-4c77-4e75-b175-229131b8ecc8"> ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [x] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [x] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: István Zoltán Szabó <[email protected]> (cherry picked from commit 718444f)

elasticsearchmachine · 2024-09-26T14:19:32Z

💔 Backport failed

Status	Branch	Result
❌	8.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 113455

…194143) # Backport This will backport the following commits from `main` to `8.x`: - [[ML] Redesign start/update model deployment dialog (#190243)](#190243)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Dima Arnautov <[email protected]>

…113664) * Adaptive allocations: scale to zero allocations (#113455) * Fix AdaptiveAllocationsScalerTests for release

jan-elastic added >non-issue :ml Machine learning Team:ML Meta label for the ML team auto-backport-and-merge v8.16.0 v9.0.0 labels Sep 24, 2024

jan-elastic requested a review from davidkyle September 24, 2024 12:38

jan-elastic force-pushed the adaptive-allocs-scale-to-zero branch from 7f35e5d to c8f5429 Compare September 24, 2024 14:10

jan-elastic commented Sep 24, 2024

View reviewed changes

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java Show resolved Hide resolved

davidkyle reviewed Sep 25, 2024

View reviewed changes

jan-elastic force-pushed the adaptive-allocs-scale-to-zero branch from fcb401e to 838ea74 Compare September 25, 2024 12:16

davidkyle reviewed Sep 25, 2024

View reviewed changes

...java/org/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScaler.java Outdated Show resolved Hide resolved

...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java Show resolved Hide resolved

jan-elastic force-pushed the adaptive-allocs-scale-to-zero branch from 2ea297e to 4d8e3d1 Compare September 26, 2024 08:51

jan-elastic and others added 5 commits September 26, 2024 11:18

Adaptive allocations: scale to zero allocations

0234e88

Change log level

77af9a0

Co-authored-by: David Kyle <[email protected]>

Replace "lastRequestTime" by "timeWithoutRequests"

7254de0

Unit test for scale to zero allocations

0843580

Feature flag scale to zero

0fdaefd

jan-elastic force-pushed the adaptive-allocs-scale-to-zero branch from bd334b9 to 0fdaefd Compare September 26, 2024 09:21

jan-elastic requested a review from davidkyle September 26, 2024 09:30

darnautov mentioned this pull request Sep 26, 2024

[ML] Redesigns start/update model deployment dialog to support adaptive resources elastic/kibana#190243

Merged

8 tasks

davidkyle approved these changes Sep 26, 2024

View reviewed changes

...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java Outdated Show resolved Hide resolved

Feature flag around "maybeStartAllocation"

6928a71

Co-authored-by: David Kyle <[email protected]>

jan-elastic added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 26, 2024

spotless

c9ed906

elasticsearchmachine merged commit f0339ed into main Sep 26, 2024
16 checks passed

elasticsearchmachine deleted the adaptive-allocs-scale-to-zero branch September 26, 2024 14:18

elasticsearchmachine added the backport pending label Sep 26, 2024

jan-elastic added a commit that referenced this pull request Sep 27, 2024

Adaptive allocations: scale to zero allocations (#113455)

b8854e0

elasticsearchmachine pushed a commit that referenced this pull request Sep 27, 2024

backport: Adaptive allocations: scale to zero allocations (#113455) (#…

2e271de

…113664) * Adaptive allocations: scale to zero allocations (#113455) * Fix AdaptiveAllocationsScalerTests for release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adaptive allocations: scale to zero allocations #113455

Adaptive allocations: scale to zero allocations #113455

Uh oh!

jan-elastic commented Sep 24, 2024

Uh oh!

elasticsearchmachine commented Sep 24, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adaptive allocations: scale to zero allocations #113455

Adaptive allocations: scale to zero allocations #113455

Uh oh!

Conversation

jan-elastic commented Sep 24, 2024

Uh oh!

elasticsearchmachine commented Sep 24, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 26, 2024

💔 Backport failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants