Skip to content

Commit f284cc1

Browse files
authored
Convert disk watermarks to RelativeByteSizeValues (#88719)
* Convert disk watermarks to RelativeByteSizeValues Similar to the existing watermark setting for the frozen tier. Pre-requisite for PR 88639 that plans to introduce max headroom settings for the disk watermarks, similar to the frozen tier max headroom setting. * Add changelog * Revert 20gb to 20GB * Make formatNoTrailingZerosPercent non static * ByteSizeValue.MINUS_ONE * Remove getMinimumTotalSizeForBelowWatermark * Remove comment * Fix minor stuff * Make parsing of RelativeByteSizeValue faster Mimicks older definitelyNotPercentage function * Remove Locale from Strings.format * More MINUS_ONE
1 parent 9e0cd2f commit f284cc1

File tree

20 files changed

+681
-740
lines changed

20 files changed

+681
-740
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -604,7 +604,7 @@ threshold has been breached:
604604

605605
logger.warn(
606606
"flood stage disk watermark [{}] exceeded on {}, all indices on this node will be marked read-only",
607-
diskThresholdSettings.describeFloodStageThreshold(),
607+
diskThresholdSettings.describeFloodStageThreshold(total, false),
608608
usage
609609
);
610610

docs/changelog/88719.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 88719
2+
summary: Convert disk watermarks to RelativeByteSizeValues
3+
area: Infra/Settings
4+
type: enhancement
5+
issues: []

docs/reference/modules/cluster/disk_allocator.asciidoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,14 +72,14 @@ Defaults to `true`. Set to `false` to disable the disk allocation decider.
7272
// tag::cluster-routing-watermark-low-tag[]
7373
`cluster.routing.allocation.disk.watermark.low` {ess-icon}::
7474
(<<dynamic-cluster-setting,Dynamic>>)
75-
Controls the low watermark for disk usage. It defaults to `85%`, meaning that {es} will not allocate shards to nodes that have more than 85% disk used. It can also be set to an absolute byte value (like `500mb`) to prevent {es} from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.
75+
Controls the low watermark for disk usage. It defaults to `85%`, meaning that {es} will not allocate shards to nodes that have more than 85% disk used. It can alternatively be set to a ratio value, e.g., `0.85`. It can also be set to an absolute byte value (like `500mb`) to prevent {es} from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.
7676
// end::cluster-routing-watermark-low-tag[]
7777

7878
[[cluster-routing-watermark-high]]
7979
// tag::cluster-routing-watermark-high-tag[]
8080
`cluster.routing.allocation.disk.watermark.high` {ess-icon}::
8181
(<<dynamic-cluster-setting,Dynamic>>)
82-
Controls the high watermark. It defaults to `90%`, meaning that {es} will attempt to relocate shards away from a node whose disk usage is above 90%. It can also be set to an absolute byte value (similarly to the low watermark) to relocate shards away from a node if it has less than the specified amount of free space. This setting affects the allocation of all shards, whether previously allocated or not.
82+
Controls the high watermark. It defaults to `90%`, meaning that {es} will attempt to relocate shards away from a node whose disk usage is above 90%. It can alternatively be set to a ratio value, e.g., `0.9`. It can also be set to an absolute byte value (similarly to the low watermark) to relocate shards away from a node if it has less than the specified amount of free space. This setting affects the allocation of all shards, whether previously allocated or not.
8383
// end::cluster-routing-watermark-high-tag[]
8484

8585
`cluster.routing.allocation.disk.watermark.enable_for_single_data_node`::
@@ -95,10 +95,10 @@ is now `true`. The setting will be removed in a future release.
9595
+
9696
--
9797
(<<dynamic-cluster-setting,Dynamic>>)
98-
Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (`index.blocks.read_only_allow_delete`) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark.
98+
Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (`index.blocks.read_only_allow_delete`) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark. Similarly to the low and high watermark values, it can alternatively be set to a ratio value, e.g., `0.95`, or an absolute byte value.
9999

100-
NOTE: You cannot mix the usage of percentage values and byte values within
101-
these settings. Either all values are set to percentage values, or all are set to byte values. This enforcement is so that {es} can validate that the settings are internally consistent, ensuring that the low disk threshold is less than the high disk threshold, and the high disk threshold is less than the flood stage threshold.
100+
NOTE: You cannot mix the usage of percentage/ratio values and byte values within
101+
the watermark settings. Either all values are set to percentage/ratio values, or all are set to byte values. This enforcement is so that {es} can validate that the settings are internally consistent, ensuring that the low disk threshold is less than the high disk threshold, and the high disk threshold is less than the flood stage threshold.
102102

103103
An example of resetting the read-only index block on the `my-index-000001` index:
104104

server/src/internalClusterTest/java/org/elasticsearch/health/HealthMetadataServiceIT.java

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,10 @@ private Settings createWatermarkSettings(String highWatermark) {
138138
DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_DISK_FLOOD_STAGE_WATERMARK_SETTING.getKey(),
139139
percentageMode ? "95%" : "1b"
140140
)
141-
.put(DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_DISK_FLOOD_STAGE_FROZEN_SETTING.getKey(), percentageMode ? "95%" : "5b")
141+
.put(
142+
DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_DISK_FLOOD_STAGE_WATERMARK_SETTING.getKey(),
143+
percentageMode ? "95%" : "5b"
144+
)
142145
.put(DiskThresholdSettings.CLUSTER_ROUTING_ALLOCATION_DISK_FLOOD_STAGE_FROZEN_MAX_HEADROOM_SETTING.getKey(), "5b")
143146
.build();
144147
}

server/src/main/java/org/elasticsearch/cluster/routing/allocation/DiskThresholdMonitor.java

Lines changed: 54 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -166,14 +166,13 @@ public void onNewInfo(ClusterInfo info) {
166166
final String node = entry.getKey();
167167
final DiskUsage usage = entry.getValue();
168168
final RoutingNode routingNode = routingNodes.node(node);
169+
final ByteSizeValue total = ByteSizeValue.ofBytes(usage.getTotalBytes());
169170

170171
if (isDedicatedFrozenNode(routingNode)) {
171-
ByteSizeValue total = ByteSizeValue.ofBytes(usage.getTotalBytes());
172-
long frozenFloodStageThreshold = diskThresholdSettings.getFreeBytesThresholdFrozenFloodStage(total).getBytes();
173-
if (usage.getFreeBytes() < frozenFloodStageThreshold) {
172+
if (usage.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdFrozenFloodStage(total).getBytes()) {
174173
logger.warn(
175174
"flood stage disk watermark [{}] exceeded on {}",
176-
diskThresholdSettings.describeFrozenFloodStageThreshold(total),
175+
diskThresholdSettings.describeFrozenFloodStageThreshold(total, false),
177176
usage
178177
);
179178
}
@@ -182,9 +181,7 @@ public void onNewInfo(ClusterInfo info) {
182181
continue;
183182
}
184183

185-
if (usage.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdFloodStage().getBytes()
186-
|| usage.getFreeDiskAsPercentage() < diskThresholdSettings.getFreeDiskThresholdFloodStage()) {
187-
184+
if (usage.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdFloodStage(total).getBytes()) {
188185
nodesOverLowThreshold.add(node);
189186
nodesOverHighThreshold.add(node);
190187
nodesOverHighThresholdAndRelocating.remove(node);
@@ -199,16 +196,14 @@ public void onNewInfo(ClusterInfo info) {
199196

200197
logger.warn(
201198
"flood stage disk watermark [{}] exceeded on {}, all indices on this node will be marked read-only",
202-
diskThresholdSettings.describeFloodStageThreshold(),
199+
diskThresholdSettings.describeFloodStageThreshold(total, false),
203200
usage
204201
);
205202

206203
continue;
207204
}
208205

209-
if (usage.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdHigh().getBytes()
210-
|| usage.getFreeDiskAsPercentage() < diskThresholdSettings.getFreeDiskThresholdHigh()) {
211-
206+
if (usage.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdHighStage(total).getBytes()) {
212207
if (routingNode != null) { // might be temporarily null if the ClusterInfoService and the ClusterService are out of step
213208
for (ShardRouting routing : routingNode) {
214209
String indexName = routing.index().getName();
@@ -226,9 +221,7 @@ public void onNewInfo(ClusterInfo info) {
226221
Math.max(0L, usage.getFreeBytes() - reservedSpace)
227222
);
228223

229-
if (usageWithReservedSpace.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdHigh().getBytes()
230-
|| usageWithReservedSpace.getFreeDiskAsPercentage() < diskThresholdSettings.getFreeDiskThresholdHigh()) {
231-
224+
if (usageWithReservedSpace.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdHighStage(total).getBytes()) {
232225
nodesOverLowThreshold.add(node);
233226
nodesOverHighThreshold.add(node);
234227

@@ -245,61 +238,57 @@ public void onNewInfo(ClusterInfo info) {
245238
);
246239
}
247240

248-
} else if (usageWithReservedSpace.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdLow().getBytes()
249-
|| usageWithReservedSpace.getFreeDiskAsPercentage() < diskThresholdSettings.getFreeDiskThresholdLow()) {
241+
} else if (usageWithReservedSpace.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdLowStage(total).getBytes()) {
242+
nodesOverHighThresholdAndRelocating.remove(node);
243+
244+
final boolean wasUnderLowThreshold = nodesOverLowThreshold.add(node);
245+
final boolean wasOverHighThreshold = nodesOverHighThreshold.remove(node);
246+
assert (wasUnderLowThreshold && wasOverHighThreshold) == false;
247+
248+
if (wasUnderLowThreshold) {
249+
logger.info(
250+
"low disk watermark [{}] exceeded on {}, replicas will not be assigned to this node",
251+
diskThresholdSettings.describeLowThreshold(total, false),
252+
usage
253+
);
254+
} else if (wasOverHighThreshold) {
255+
logger.info(
256+
"high disk watermark [{}] no longer exceeded on {}, but low disk watermark [{}] is still exceeded",
257+
diskThresholdSettings.describeHighThreshold(total, false),
258+
usage,
259+
diskThresholdSettings.describeLowThreshold(total, false)
260+
);
261+
}
250262

251-
nodesOverHighThresholdAndRelocating.remove(node);
263+
} else {
264+
nodesOverHighThresholdAndRelocating.remove(node);
252265

253-
final boolean wasUnderLowThreshold = nodesOverLowThreshold.add(node);
254-
final boolean wasOverHighThreshold = nodesOverHighThreshold.remove(node);
255-
assert (wasUnderLowThreshold && wasOverHighThreshold) == false;
266+
if (nodesOverLowThreshold.contains(node)) {
267+
// The node has previously been over the low watermark, but is no longer, so it may be possible to allocate more
268+
// shards if we reroute now.
269+
if (lastRunTimeMillis.get() <= currentTimeMillis - diskThresholdSettings.getRerouteInterval().millis()) {
270+
reroute = true;
271+
explanation = "one or more nodes has gone under the high or low watermark";
272+
nodesOverLowThreshold.remove(node);
273+
nodesOverHighThreshold.remove(node);
256274

257-
if (wasUnderLowThreshold) {
258275
logger.info(
259-
"low disk watermark [{}] exceeded on {}, replicas will not be assigned to this node",
260-
diskThresholdSettings.describeLowThreshold(),
276+
"low disk watermark [{}] no longer exceeded on {}",
277+
diskThresholdSettings.describeLowThreshold(total, false),
261278
usage
262279
);
263-
} else if (wasOverHighThreshold) {
264-
logger.info(
265-
"high disk watermark [{}] no longer exceeded on {}, but low disk watermark [{}] is still exceeded",
266-
diskThresholdSettings.describeHighThreshold(),
267-
usage,
268-
diskThresholdSettings.describeLowThreshold()
269-
);
270-
}
271-
272-
} else {
273280

274-
nodesOverHighThresholdAndRelocating.remove(node);
275-
276-
if (nodesOverLowThreshold.contains(node)) {
277-
// The node has previously been over the low watermark, but is no longer, so it may be possible to allocate more
278-
// shards
279-
// if we reroute now.
280-
if (lastRunTimeMillis.get() <= currentTimeMillis - diskThresholdSettings.getRerouteInterval().millis()) {
281-
reroute = true;
282-
explanation = "one or more nodes has gone under the high or low watermark";
283-
nodesOverLowThreshold.remove(node);
284-
nodesOverHighThreshold.remove(node);
285-
286-
logger.info(
287-
"low disk watermark [{}] no longer exceeded on {}",
288-
diskThresholdSettings.describeLowThreshold(),
289-
usage
290-
);
291-
292-
} else {
293-
logger.debug(
294-
"{} has gone below a disk threshold, but an automatic reroute has occurred "
295-
+ "in the last [{}], skipping reroute",
296-
node,
297-
diskThresholdSettings.getRerouteInterval()
298-
);
299-
}
281+
} else {
282+
logger.debug(
283+
"{} has gone below a disk threshold, but an automatic reroute has occurred "
284+
+ "in the last [{}], skipping reroute",
285+
node,
286+
diskThresholdSettings.getRerouteInterval()
287+
);
300288
}
301-
302289
}
290+
291+
}
303292
}
304293

305294
final ActionListener<Void> listener = new GroupedActionListener<>(ActionListener.wrap(this::checkFinished), 3);
@@ -325,16 +314,15 @@ public void onNewInfo(ClusterInfo info) {
325314
usageIncludingRelocations = diskUsage;
326315
relocatingShardsSize = 0L;
327316
}
317+
final ByteSizeValue total = ByteSizeValue.ofBytes(usageIncludingRelocations.getTotalBytes());
328318

329-
if (usageIncludingRelocations.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdHigh().getBytes()
330-
|| usageIncludingRelocations.getFreeDiskAsPercentage() < diskThresholdSettings.getFreeDiskThresholdHigh()) {
331-
319+
if (usageIncludingRelocations.getFreeBytes() < diskThresholdSettings.getFreeBytesThresholdHighStage(total).getBytes()) {
332320
nodesOverHighThresholdAndRelocating.remove(diskUsage.getNodeId());
333321
logger.warn(
334322
"high disk watermark [{}] exceeded on {}, shards will be relocated away from this node; "
335323
+ "currently relocating away shards totalling [{}] bytes; the node is expected to continue to exceed "
336324
+ "the high disk watermark when these relocations are complete",
337-
diskThresholdSettings.describeHighThreshold(),
325+
diskThresholdSettings.describeHighThreshold(total, false),
338326
diskUsage,
339327
-relocatingShardsSize
340328
);
@@ -343,15 +331,15 @@ public void onNewInfo(ClusterInfo info) {
343331
"high disk watermark [{}] exceeded on {}, shards will be relocated away from this node; "
344332
+ "currently relocating away shards totalling [{}] bytes; the node is expected to be below the high "
345333
+ "disk watermark when these relocations are complete",
346-
diskThresholdSettings.describeHighThreshold(),
334+
diskThresholdSettings.describeHighThreshold(total, false),
347335
diskUsage,
348336
-relocatingShardsSize
349337
);
350338
} else {
351339
logger.debug(
352340
"high disk watermark [{}] exceeded on {}, shards will be relocated away from this node; "
353341
+ "currently relocating away shards totalling [{}] bytes",
354-
diskThresholdSettings.describeHighThreshold(),
342+
diskThresholdSettings.describeHighThreshold(total, false),
355343
diskUsage,
356344
-relocatingShardsSize
357345
);

0 commit comments

Comments
 (0)