Skip to content

Conversation

kosabogi
Copy link
Contributor

@kosabogi kosabogi commented Aug 11, 2025

This PR adds information about cooldown periods for trained model autoscaling in serverless projects.

Changes

Related issue: https://github.com/elastic/docs-content-internal/issues/177

@kosabogi kosabogi requested a review from ppf2 August 11, 2025 12:15
@kosabogi kosabogi requested a review from a team as a code owner August 11, 2025 12:15
@kosabogi kosabogi added the documentation Improvements or additions to documentation label Aug 11, 2025
Copy link

github-actions bot commented Aug 11, 2025

Copy link
Contributor

@kilfoyle kilfoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🦖
Very nice!

@ppf2 ppf2 requested a review from prwhelan August 14, 2025 15:05
@ppf2
Copy link
Contributor

ppf2 commented Aug 14, 2025

@prwhelan Can you review for technical accuracy? Thx!

* When using the inference API for {{es}} or ELSER, [enable `adaptive_allocations`](../../autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations).

::::{note}
In {{serverless-short}}, trained model deployments scale down to zero only after 24 hours without any inference requests. After scaling up, they remain active for 5 minutes before they can scale down again. During these cooldown periods, you will continue to be billed for the active resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true outside of serverless as well. All environments will now wait 24 hours before scaling to zero: elastic/elasticsearch#128914

Outside of serverless, this can be modified using xpack.ml.trained_models.adaptive_allocations.scale_to_zero_time to a minimum of one minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prwhelan, thanks a lot for your feedback! I've modified my PR based on it, along with a few other smaller changes:

  • Trained model autoscaling: I moved the cooldown period information into its own heading. This makes it easier to highlight and also allows other pages to link directly to this specific section.

  • Autoscaling: I felt that going into the details of cooldown periods here would be out of scope and make the page a bit overwhelming. Instead, I added a more concise sentence that links to the new Cooldown periods section on the Trained model autoscaling page.

  • Elasticsearch billing dimensions: Realizing that this page is only applicable to Serverless, I updated the description for the Machine learning trained model autoscaling bullet point to reflect the new autoscaling behavior in Serverless.

Please let me know if you think these changes are appropriate or if you’d like me to adjust anything.
Thanks again!

Copy link
Member

@prwhelan prwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry for a last minute change. We are lowering the default value from 24 hours to 4 hours, and we are adding a maximum value of 72 hours:
elastic/elasticsearch#133355

Please ignore :)

We will eventually reduce this to 4 hours, but I will update the documentation when that time comes.

@kosabogi kosabogi merged commit 1eb144c into main Aug 25, 2025
7 checks passed
@kosabogi kosabogi deleted the cooldown-periods branch August 25, 2025 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants