Skip to content

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Apr 4, 2025

From the existing ECK docs, it sounds as though CPU and Ram can be independently scaled with a "Decider" from Elasticsearch, but no such decider exists. This makes it clear that cpu/ram are scaled relative to the storage min/max settings.

@naemono naemono requested review from a team and eedugon April 4, 2025 19:37


ECK can leverage the [autoscaling API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-autoscaling) introduced in Elasticsearch 7.11 to adjust automatically the number of Pods and the allocated resources in a tier. Currently, autoscaling is supported for Elasticsearch [data tiers](/manage-data/lifecycle/data-tiers.md) and machine learning nodes.
ECK can leverage the [autoscaling API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-autoscaling) introduced in Elasticsearch 7.11 to adjust automatically the number of Pods and the allocated resources in a tier. Currently, autoscaling is supported for Elasticsearch [data tiers](/manage-data/lifecycle/data-tiers.md) and machine learning nodes. ECK scales Elasticsearch data and machine learning tiers exclusively by scaling storage. CPU and Memory are scaled *relative* to the storage resource min/max settings, and not independently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECK scales Elasticsearch data and machine learning tiers exclusively by scaling storage

I don't think this is true, at least for the ML and frozen tiers for which ES returns memory requirements: https://www.elastic.co/guide/en/elasticsearch/reference/current/autoscaling-deciders.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @barkbay. I have read through the docs, and I have updated appropriately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECK scales Elasticsearch data tiers exclusively by scaling storage.

Maybe I'm missing something but I think this is still not true. The frozen tier is scaled based on both storage and memory requirements:

Frozen shards decider
Estimates required memory capacity based on the number of partially mounted shards. Available for policies governing frozen data nodes.
Frozen storage decider
Estimates required storage capacity as a percentage of the total data set of partially mounted indices. Available for policies governing frozen data nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barkbay I had noted the Frozen tier at the end of this paragraph, but I've updated this again to try to clarify. If you have a suggestion for making this more clear (maybe a table would help?), I'm up for suggestions.

@naemono naemono requested a review from barkbay April 7, 2025 18:40


ECK can leverage the [autoscaling API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-autoscaling) introduced in Elasticsearch 7.11 to adjust automatically the number of Pods and the allocated resources in a tier. Currently, autoscaling is supported for Elasticsearch [data tiers](/manage-data/lifecycle/data-tiers.md) and machine learning nodes.
ECK can leverage the [autoscaling API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-autoscaling) introduced in Elasticsearch 7.11 to adjust automatically the number of Pods and the allocated resources in a tier. Currently, autoscaling is supported for Elasticsearch [data tiers](/manage-data/lifecycle/data-tiers.md) and machine learning nodes. ECK scales Elasticsearch data tiers (excluding frozen tiers) exclusively by scaling storage. CPU and Memory are scaled *relative* to the storage resource min/max settings, and not independently in data tiers (again excluding frozen tiers). ECK can scale memory and CPU on ML tiers if specified in the `ElasticsearchAutoscaler.spec`. On Frozen tiers ECK can scale memory if specified in the `ElasticsearchAutoscaler.cpu`, but will scale CPU in relation to the storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should not consider the resource types returned for each tiers as an implementation detail. It feels like we are duplicating the Elasticsearch documentation which already explains what type of resources are estimated: https://www.elastic.co/guide/en/elasticsearch/reference/current/autoscaling-deciders.html

Instead we could explain how missing resources are calculated by the operator:

Suggested change
ECK can leverage the [autoscaling API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-autoscaling) introduced in Elasticsearch 7.11 to adjust automatically the number of Pods and the allocated resources in a tier. Currently, autoscaling is supported for Elasticsearch [data tiers](/manage-data/lifecycle/data-tiers.md) and machine learning nodes. ECK scales Elasticsearch data tiers (excluding frozen tiers) exclusively by scaling storage. CPU and Memory are scaled *relative* to the storage resource min/max settings, and not independently in data tiers (again excluding frozen tiers). ECK can scale memory and CPU on ML tiers if specified in the `ElasticsearchAutoscaler.spec`. On Frozen tiers ECK can scale memory if specified in the `ElasticsearchAutoscaler.cpu`, but will scale CPU in relation to the storage.
ECK can leverage the [autoscaling API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-autoscaling) introduced in Elasticsearch 7.11 to adjust automatically the number of Pods and the allocated resources in a tier. Currently, autoscaling is supported for Elasticsearch [data tiers](/manage-data/lifecycle/data-tiers.md) and machine learning nodes. Required resources for each tiers are estimated by [Elasticsearch deciders](https://www.elastic.co/guide/en/elasticsearch/reference/current/autoscaling-deciders.html). Deciders may return required CPU, memory or storage capacity. If a resource type is missing in the decider's output, it is inferred relative to the others. For example, if a decider does not return a memory requirement, then memory is calculated proportionally to the required amount of storage returned by the decider. The same goes for CPU which is inferred from memory if it is absent from the decider's result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do what @barkbay suggests. But I also think we should call out in very simple language what is actually supported or not supported today. I know this would be duplicating some of the content from the Elasticsearch docs on deciders but it can be a bit confusing to read the decider docs. Do all of them apply, which ones do not apply?

ECK can scale memory and CPU on ML tiers if specified in the ElasticsearchAutoscaler.spec. On Frozen tiers ECK can scale memory if specified in the ElasticsearchAutoscaler.cpu

I am struggeling to parse this wording. What are we trying to say here? Why can ECK scale memory when you specify what? What is ElasticsearchAutoscaler.cpu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a typo. Intended to be ElasticsearchAutoscaler.spec. ECK can scale memory in frozen tiers according to what's returned by the ES deciders if specified, otherwise it will scale it in relation to storage.

This isn't the most straight-forward thing to understand from a customer standpoint, as each tier has it's own set of supported options. Would a table showing the available options for each tier be a more clear that the words we're suggesting @barkbay @pebrc ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the lag in answering, not sure myself what would be the best option. I tend to think that https://www.elastic.co/guide/en/elasticsearch/reference/current/autoscaling-deciders.html should be improved as I think most of the readers are interested in which resources are estimated for each tier, not really about a list of the available deciders which should be an implementation detail.

We can use a table, maybe something along the lines of:

Storage Memory CPU
Data Nodes (except Frozen) Yes Calculated proportionally to the required amount of storage Calculated proportionally to the required amount of memory
Frozen Nodes Yes Yes Calculated proportionally to the required amount of memory
Machine Learning No Yes Calculated proportionally to the required amount of memory

As a side note I just realized that https://www.elastic.co/docs/deploy-manage/autoscaling/autoscaling-in-ece-and-ech does not mention the frozen tier case, so maybe you were right in the beginning and It's okay not to be that specific 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think the table format is much clearer to understand as opposed to reading a wall of text. I'll update this updated and we can review further. ty!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is much more clear:

image

@naemono naemono requested review from barkbay and pebrc April 15, 2025 13:10
Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Peter Brachwitz <[email protected]>
@naemono naemono enabled auto-merge (squash) April 17, 2025 13:49
Copy link
Contributor

@eedugon eedugon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@naemono naemono merged commit c9b0458 into elastic:main Apr 21, 2025
3 of 4 checks passed
@naemono naemono deleted the eck-update-autoscaling-docs branch April 22, 2025 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants