Skip to content

Dynamic shard sizing based on search nodes operating system memory#24372

Merged
patrickmann merged 6 commits intomasterfrom
dynamic_shard_sizing
Dec 1, 2025
Merged

Dynamic shard sizing based on search nodes operating system memory#24372
patrickmann merged 6 commits intomasterfrom
dynamic_shard_sizing

Conversation

@AntonEbel
Copy link
Contributor

@AntonEbel AntonEbel commented Nov 27, 2025

Description

This PR implements dynamic shard sizing for the index rotation. TIME_SIZE_OPTIMIZING_ROTATION_MIN_SHARD_SIZE and TIME_SIZE_OPTIMIZING_ROTATION_MAX_SHARD_SIZE no longer have default values. If either of these configuration fields is not set, dynamic shard sizing is enabled. This feature uses the operating system memory of Opensearch/Elasticsearch nodes with the data role to calculate the maximum index shard size. For this purpose, the node with the smallest memory is used.

Motivation and Context

closes #23947 (Pitch)

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (non-breaking change)
  • Breaking change (fix or feature that would cause existing functionality to change)

/prd Graylog2/graylog-plugin-enterprise#12643

@AntonEbel AntonEbel marked this pull request as ready for review November 27, 2025 16:57
@AntonEbel AntonEbel requested a review from a team November 27, 2025 17:28
@patrickmann
Copy link
Contributor

Codex noticed this:
ShardsMetricsSupplier.java:43-50 calls getTimeSizeOptimizingRotationMin/MaxShardSize().getQuantity() unconditionally. These settings default to null for the new dynamic sizing mode, so the telemetry job will throw an NPE before emitting metrics (and ShardsMetricsSupplierTest still assumes non-null sizes). Needs null guards/fallbacks.

@patrickmann patrickmann self-requested a review November 28, 2025 15:06
# Conflicts:
#	graylog2-server/src/main/java/org/graylog2/datatiering/rotation/DataTierRotation.java
#	graylog2-server/src/main/java/org/graylog2/indexer/rotation/strategies/TimeBasedSizeOptimizingStrategy.java
#	graylog2-server/src/test/java/org/graylog2/indexer/rotation/strategies/TimeBasedSizeOptimizingRotationAndRetentionTest.java
@patrickmann patrickmann merged commit 93f9850 into master Dec 1, 2025
25 checks passed
@patrickmann patrickmann deleted the dynamic_shard_sizing branch December 1, 2025 13:33
@boosty
Copy link
Contributor

boosty commented Jan 13, 2026

@AntonEbel @patrickmann Is the memory calculation container aware? E.g. when running OpenSearch as a pod in Kubernetes and setting memory limits?

@AntonEbel
Copy link
Contributor Author

@boosty We currently use the following value nodes.os.mem.total_in_bytes from this API call /_nodes/stats/os. As far as I know, it is not container aware. Opensearch has the following open issue to support cgroup v2. Once this issue is resolved, we can make it container aware.

@boosty
Copy link
Contributor

boosty commented Jan 13, 2026

@AntonEbel Thanks for the quick feedback! That is a very important piece of information. I think it's not uncommon to deploy DataNode (or raw OpenSearch) as containers these days. The new official Graylog Helm chart will probably accelerate this.

@tellistone pinging you for awareness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set a dynamic shard size according to Opensearch node ram

3 participants