Skip to content

Commit d595387

Browse files
committed
Merge branch 'dev/vladelekic/documentationForBalancingPerNodeType' of https://github.com/vladelekic/azure-docs-pr into balancingPerNodeType
pulling in contributor's branch to recreate PR
2 parents e19e8d1 + 47de1ba commit d595387

File tree

3 files changed

+141
-0
lines changed

3 files changed

+141
-0
lines changed
Loading
Loading

articles/service-fabric/service-fabric-cluster-resource-manager-balancing.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,145 @@ The Cluster Resource Manager automatically figures out what services are related
198198
![Diagram that shows that Cluster Resource Manager determines what services are related.][Image5]
199199
</center>
200200

201+
## Balancing of a cluster per node type
202+
203+
As we described in the earlier sections, the main controls of triggering rebalancing are [activity thresholds](service-fabric-cluster-resource-manager-balancing.md#activity-thresholds), [balancing thresholds](service-fabric-cluster-resource-manager-balancing.md#balancing-thresholds) and [timers](service-fabric-cluster-resource-manager-balancing.md#configuring-cluster-resource-manager-timers). The Service Fabric Cluster Resource Manager provides more granular control over triggering rebalancing with specifying parameters per node type and allowing movement only on imbalanced node types. The main benefit of balancing per node type is allowing performance improvement on node types that require more strict balancing rules, without performance degradation on other node types. The feature contains two main parts:
204+
205+
- Detection of imbalance is done per node type. Previously global calculation of imbalance is calculated for each node type. If all node types are balanced, the CRM will not trigger balancing phase. Otherwise, if at least one node type is imbalanced, balancing phase is needed.
206+
- Balancing moves replicas only on a node types that are imbalanced, other node types aren't impacted by balancing phase.
207+
208+
### How balancing per node type affects a cluster
209+
210+
During balancing of a cluster per node type, the Service Fabric Cluster Resource Manager calculates the imbalance state for each node type. If at least one node type is imbalanced, the balancing phase will be triggered. Balancing phase will not move replicas on node types that are imbalanced, when balancing is temporarily paused on these node types (e.g. minimal balancing interval hasn't passed since a previous balancing phase). The detection of an imbalanced state uses common mechanisms already available for classical cluster balancing, but improves configuration granularity and flexibility. The mechanisms uses for balancing per node type to detect imbalance are provided in the list below:
211+
- **Metric balancing thresholds** per node type are values that have a similar role as the globally-defined balancing threshold used in classical balancing. The ratio of minimum and maximum metric load is calculated for each node type. If that ratio of a node type is higher than the defined balancing threshold of the node type, the node type is marked as imbalanced. For more details regarding configuration of metric activity thresholds per node type, please check [here](service-fabric-cluster-resource-manager-balancing#balancing-thresholds-per-node-type).
212+
- **Metric activity thresholds** per node type are values that have a similar role to the globally-defined activity threshold used in classical balancing. The maximum metric load is calculated for each node type. If the maximum load of a node type is higher than the defined activity threshold for that node type, the node type is marked as imbalanced. For more details regarding configuration of metric activity thresholds per node type, please check [here](service-fabric-cluster-resource-manager-balancing#activity-thresholds-per-node-type).
213+
- **Minimum balancing interval** per node type has a role similar to the globally-define minimum balancing interval. For each node type, the Cluster Resource Manager preserves the timestamp of the last balancing. Two consecutive balancing phases couldn't be executed on a node type within the defined minimum balancing interval. For more details regarding configuration of minimum balancing interval per node type, please check [here](service-fabric-cluster-resource-manager-balancing.md#minimum-balancing-interval-per-node-type).
214+
215+
### Describing balancing per node type
216+
217+
In order to enable balancing per node type, parameter SeparateBalancingStrategyPerNodeType needs to be enabled in a cluster manifest. Additionally, subclustering feature needs to be enabled as well. Example of a cluster manifest PlacementAndLoadBalancing section for enabling the feature:
218+
219+
``` xml
220+
<Section Name="PlacementAndLoadBalancing">
221+
<Parameter Name="SeparateBalancingStrategyPerNodeType" Value="true" />
222+
<Parameter Name="SubclusteringEnabled" Value="true" />
223+
<Parameter Name="SubclusteringReportingPolicy" Value="1" />
224+
</Section>
225+
```
226+
227+
ClusterConfig.json for Standalone deployments or Template.json for Azure hosted clusters:
228+
229+
``` JSON
230+
"fabricSettings": [
231+
{
232+
"name": "PlacementAndLoadBalancing",
233+
"parameters": [
234+
{
235+
"name": "SeparateBalancingStrategyPerNodeType",
236+
"value": "true"
237+
},
238+
{
239+
"name": "SubclusteringEnabled",
240+
"value": "true"
241+
},
242+
{
243+
"name": "SubclusteringReportingPolicy",
244+
"value": "1"
245+
},
246+
]
247+
}
248+
]
249+
```
250+
251+
As we described in [the previous section](service-fabric-cluster-resource-manager-balancing.md#how-balancing-per-node-type-affects-a-cluster), thresholds and intervals could be specified per node type. For more details about updating specific parameter, please check following sections:
252+
- [Metric balancing thresholds per node type](service-fabric-cluster-resource-manager-balancing.md#balancing-thresholds-per-node-type)
253+
- [Metric activity thresholds per node type](service-fabric-cluster-resource-manager-balancing.md#activity-thresholds-per-node-type)
254+
- [Minimum balancing interval per node type](service-fabric-cluster-resource-manager-balancing.md#minimum-balancing-interval-per-node-type)
255+
256+
#### Balancing thresholds per node type
257+
258+
Metric balancing threshold could be defined per node type in order to increase granularity from balancing configuration. Balancing thresholds have floating-point type, since they represent threshold for ratio of maximum and minimum load value within particular node type. Balancing thresholds are defined in **PlacementAndLoadBalancingOverrides** section for each node type:
259+
260+
``` xml
261+
<NodeTypes>
262+
<NodeType Name="NodeType1">
263+
<PlacementAndLoadBalancingOverrides>
264+
<MetricBalancingThresholdsPerNodeType>
265+
<BalancingThreshold Name="Metric1" Value="2.5">
266+
<BalancingThreshold Name="Metric2" Value="4">
267+
<BalancingThreshold Name="Metric3" Value="3.25">
268+
</MetricBalancingThresholdsPerNodeType>
269+
</PlacementAndLoadBalancingOverrides>
270+
</NodeType>
271+
</NodeTypes>
272+
```
273+
274+
If balancing threshold for a metric isn't defined for a node type, threshold inherits value of metric balancing threshold defined globally in the **PlacementAndLoadBalancing** section. Otherwise, if balancing threshold for a metric isn't defined neither for a node type nor globally in a **PlacementAndLoadBalancing** section, threshold will have default value of *one*.
275+
276+
#### Activity thresholds per node type
277+
278+
Metric activity threshold could be defined per node type in order to increase granularity of balancing configuration. Activity thresholds have integer type, since they represent threshold for maximum load value within particular node type. Activity thresholds are defined in **PlacementAndLoadBalancingOverrides** section for each node type:
279+
280+
``` xml
281+
<NodeTypes>
282+
<NodeType Name="NodeType1">
283+
<PlacementAndLoadBalancingOverrides>
284+
<MetricActivityThresholdsPerNodeType>
285+
<ActivityThreshold Name="Metric1" Value="500">
286+
<ActivityThreshold Name="Metric2" Value="40">
287+
<ActivityThreshold Name="Metric3" Value="1000">
288+
</MetricActivityThresholdsPerNodeType>
289+
</PlacementAndLoadBalancingOverrides>
290+
</NodeType>
291+
</NodeTypes>
292+
```
293+
294+
If activity threshold for a metric isn't defined for a node type, threshold inherits value from metric activity threshold defined globally in the **PlacementAndLoadBalancing** section. Otherwise, if activity threshold for a metric isn't defined neither for a node type nor globally in a **PlacementAndLoadBalancing** section, threshold will have default value of *zero*.
295+
296+
#### Minimum balancing interval per node type
297+
298+
Minimal balancing interval could be defined per node type in order to increase granularity of balancing configuration. Minimal balancing interval has integer type, since it represents the minimum amount of time that must pass before two consecutive balancing rounds on a same node type. Minimum balancing interval is defined in **PlacementAndLoadBalancingOverrides** section for each node type:
299+
300+
``` xml
301+
<NodeTypes>
302+
<NodeType Name="NodeType1">
303+
<PlacementAndLoadBalancingOverrides>
304+
<MinLoadBalancingIntervalPerNodeType>100</MinLoadBalancingIntervalPerNodeType>
305+
</PlacementAndLoadBalancingOverrides>
306+
</NodeType>
307+
</NodeTypes>
308+
```
309+
310+
If minimal balancing interval isn't defined for a node type, interval inherits value from minimum balancing interval defined globally in the **PlacementAndLoadBalancing** section. Otherwise, if minimal interval isn't defined neither for a node type nor globally in a **PlacementAndLoadBalancing** section, minimal interval will have default value of *zero* which indicates that pause between consecutive balancing rounds isn't required.
311+
312+
### Examples
313+
314+
#### Example1
315+
316+
Let's consider a case where a cluster contains two node types, node type **A** and node type **B**. All services report a same metric and they are split between these node types, thus load statistics are different for them. In the example, the node type **A** has maximum load of 300 and minimum of 100, and the node type **B** has maximum load of 700 and minimum load of 500:
317+
318+
<center>
319+
320+
![Balancing Threshold Example][Image6]
321+
</center>
322+
323+
Customer detected that workloads of two node types have different balancing needs and decided to set different balancing and activity thresholds per node type. Balancing threshold of node type **A** is *2.5*, and activity threshold is *50*. For node type **B**, customer set balancing threshold to *1.2*, and activity threshold to *400*.
324+
325+
During detection of imbalance for the cluster in this example, both node types violate activity threshold. Maximum load of node type **A** of *300* is higher than defined activity threshold of *50*. Maximum load of node type **B** of *700* is higher than defined activity threshold of *400*. Node type **A** violates balancing threshold criteria, since current ratio of maximum and minimum load is *3*, and balancing threshold is *2.5*. Opposite, node type **B** doesn't violate balancing threshold criteria, since current ratio of maximum and minimum load for this node type is *1.2*, but balancing threshold is *1.4*. Balancing is required only for replicas in the node type **A**, and the only set of replicas that will be eligible for movements during balancing phase are replicas placed in the node type **A**.
326+
327+
#### Example2
328+
329+
Let's consider a case where a cluster contains three node types, node type **A**, **B** and **C**. All services report a same metric and they are split between these node types, thus load statistics are different for them. In the example, the node type **A** has maximum load of 600 and minimum of 100, the node type **B** has maximum load of 900 and minimum load of 100, and node type **C** has maximum load of 600 and minimum load of 300:
330+
331+
<center>
332+
333+
![Balancing Threshold Example][Image7]
334+
</center>
335+
336+
Customer detected that workloads of these node types have different balancing needs and decided to set different balancing and activity thresholds per node type. Balancing threshold of node type **A** is *5*, and activity threshold is *700*. For node type **B**, customer set balancing threshold to *10*, and activity threshold to *200*. For node type **C**, customer set balancing threshold to *2*, and activity threshold to *300*.
337+
338+
Maximum load of node type **A** of *600* is lower than defined activity threshold of *700*, thus node type **A** will not be balanced. Maximum load of node type **B** of *900* is higher than defined activity threshold of *200*. The node type **B** violates activity threshold criteria. Maximum load of node type **C** of *600* is higher than defined activity threshold of *300*. The node type **C** violates activity threshold criteria. The node type **B** doesn't violate balancing threshold criteria, since current ratio of maximum and minimum load for this node type is *9*, but balancing threshold is *10*. Node type **C** violates balancing threshold criteria, since current ratio of maximum and minimum load is *2*, and balancing threshold is *2*. Balancing is required only for replicas in the node type **C**, and the only set of replicas that will be eligible for movements during balancing phase are replicas placed in the node type **C**.
339+
201340
## Next steps
202341
* Metrics are how the Service Fabric Cluster Resource Manger manages consumption and capacity in the cluster. To learn more about metrics and how to configure them, check out [this article](service-fabric-cluster-resource-manager-metrics.md)
203342
* Movement Cost is one way of signaling to the Cluster Resource Manager that certain services are more expensive to move than others. For more about movement cost, refer to [this article](service-fabric-cluster-resource-manager-movement-cost.md)
@@ -209,3 +348,5 @@ The Cluster Resource Manager automatically figures out what services are related
209348
[Image3]:./media/service-fabric-cluster-resource-manager-balancing/cluster-resource-manager-activity-thresholds.png
210349
[Image4]:./media/service-fabric-cluster-resource-manager-balancing/cluster-resource-manager-balancing-services-together1.png
211350
[Image5]:./media/service-fabric-cluster-resource-manager-balancing/cluster-resource-manager-balancing-services-together2.png
351+
[Image6]:./media/service-fabric-cluster-resource-manager-balancing/cluster-resource-manager-balancing-per-node-type-example1.png
352+
[Image7]:./media/service-fabric-cluster-resource-manager-balancing/cluster-resource-manager-balancing-per-node-type-example2.png

0 commit comments

Comments
 (0)