You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/service-fabric/how-to-managed-cluster-availability-zones.md
+40-40Lines changed: 40 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,13 +24,13 @@ Sample templates are available: [Service Fabric cross availability zone template
24
24
>[!NOTE]
25
25
>The benefit of spanning the primary node type across availability zones is really only seen for three zones and not just two.
26
26
27
-
A Service Fabric cluster distributed across Availability Zones ensures high availability of the cluster state.
27
+
A Service Fabric cluster distributed across Availability Zones (AZ) ensures high availability of the cluster state.
28
28
29
29
The recommended topology for managed cluster requires the following resources:
30
30
31
31
* The cluster SKU must be Standard
32
32
* Primary node type should have at least nine nodes (3 in each AZ) for best resiliency, but supports minimum number of six (2 in each AZ).
33
-
* Secondary node type(s) should have at least six nodes for best resiliency, but supports minimum number of three.
33
+
* Secondary node types should have at least six nodes for best resiliency, but supports minimum number of three.
34
34
35
35
>[!NOTE]
36
36
>Only 3 Availability Zone deployments are supported.
@@ -46,13 +46,13 @@ Sample node list depicting FD/UD formats in a virtual machine scale set spanning
46
46
![Sample node list depicting FD/UD formats in a virtual machine scale set spanning zones.][sfmc-multi-az-nodes]
47
47
48
48
**Distribution of Service replicas across zones**:
49
-
When a service is deployed on the node types that are spanning zones, the replicas are placed to ensure they land up in separate zones. This separation is ensured as the fault domain’s on the nodes present in each of these node types are configured with the zone information (i.e FD = fd:/zone1/1 etc.). For example: for five replicas or instances of a service, the distribution will be 2-2-1 and runtime will try to ensure equal distribution across AZs.
49
+
When a service is deployed on the node types that are spanning zones, the replicas are placed to ensure they land up in separate zones. This separation is ensured as the fault domain’s on the nodes present in each of these node types are configured with the zone information (i.e FD = fd:/zone1/1 etc.). For example: for five replicas or instances of a service, the distribution is 2-2-1 and runtime tries to ensure equal distribution across AZs.
50
50
51
51
**User Service Replica Configuration**:
52
52
Stateful user services deployed on the cross-availability zone node types should be configured with this configuration: replica count with target = 9, min = 5. This configuration helps the service to be working even when one zone goes down since six replicas will be still up in the other two zones. An application upgrade in such a scenario will also go through.
53
53
54
54
**Zone down scenario**:
55
-
When a zone goes down, all the nodes in that zone appear as down. Service replicas on these nodes will also be down. Since there are replicas in the other zones, the service continues to be responsive with primary replicas failing over to the zones that are functioning. The services will appear in warning state as the target replica count is not met and the VM count is still more than the defined min target replica size. As a result, Service Fabric load balancer brings up replicas in the working zones to match the configured target replica count. At this point, the services should appear healthy. When the zone that was down comes back up, the load balancer will again spread all the service replicas evenly across all the zones.
55
+
When a zone goes down, all the nodes in that zone appear as down. Service replicas on these nodes will also be down. Since there are replicas in the other zones, the service continues to be responsive with primary replicas failing over to the zones that are functioning. The services will appear in warning state as the target replica count is not met and the virtual machine (VM) count is still more than the defined min target replica size. As a result, Service Fabric load balancer brings up replicas in the working zones to match the configured target replica count. At this point, the services should appear healthy. When the zone that was down comes back up, the load balancer will again spread all the service replicas evenly across all the zones.
56
56
57
57
## Networking Configuration
58
58
For more information, see [Configure network settings for Service Fabric managed clusters](./how-to-managed-cluster-networking.md)
@@ -82,19 +82,19 @@ Requirements:
82
82
>[!NOTE]
83
83
>Migration to a zone resilient configuration can cause a brief loss of external connectivity through the load balancer, but will not affect cluster health. This occurs when a new Public IP needs to be created in order to make the networking resilient to Zone failures. Please plan the migration accordingly.
84
84
85
-
1) Start with determining if there will be a new IP required and what resources need to be migrated to become zone resilient. To get the current Availability Zone resiliency state for the resources of the managed cluster use the following API call:
86
-
87
-
```http
88
-
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ServiceFabric/managedClusters/{clusterName}/getazresiliencystatus?api-version=2022-02-01-preview
1) Start with determining if a new IP is required and what resources need to be migrated to become zone resilient. To get the current Availability Zone resiliency state for the resources of the managed cluster, use the following API call:
86
+
87
+
```http
88
+
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ServiceFabric/managedClusters/{clusterName}/getazresiliencystatus?api-version=2022-02-01-preview
If the Public IP resource is not zone resilient, migration of the cluster will cause a brief loss of external connectivity. This connection loss is due to the migration setting up new Public IP and updating the cluster FQDN to the new IP. If the Public IP resource is zone resilient, migration will not modify the Public IP resource nor the FQDN, and there will be no external connectivity impact.
119
+
If the Public IP resource is not zone resilient, migration of the cluster will cause a brief loss of external connectivity. This connection loss is due to the migration setting up new Public IP and updating the cluster Fully qualified domain name (FQDN) to the new IP. If the Public IP resource is zone resilient, migration will not modify the Public IP resource nor the FQDN, and there will be no external connectivity impact.
120
120
121
-
2) Initiate conversion of the underlying storage account created for managed cluster from LRS to ZRS using [customer-initiated conversion](../storage/common/redundancy-migration.md#customer-initiated-conversion). The resource group of storage account that needs to be migrated would be of the form "SFC_ClusterId"(ex SFC_9240df2f-71ab-4733-a641-53a8464d992d) under the same subscription as the managed cluster resource.
121
+
2) Initiate conversion of the underlying storage account created for managed cluster from Locally redundant storage (LRS) to Zone Redundant Storage (ZRS) using [customer-initiated conversion](../storage/common/redundancy-migration.md#customer-initiated-conversion). The resource group of storage account that needs to be migrated would be of the form "SFC_ClusterId"(ex SFC_9240df2f-71ab-4733-a641-53a8464d992d) under the same subscription as the managed cluster resource.
122
122
123
123
3) Add zones property to existing node types
124
124
125
-
This step configures the managed Virtual Machine Scale Set associated with the node type as zone-resilient, ensuring that any new VMs added to it will be deployed across availability zones (Zonal VMs). If the specified node type is primary, the resource provider will perform the migration of the Public IP along with a cluster FQDN DNS update, if needed, to become zone resilient. Use the `getazresiliencystatus` API above to understand implication of this step.
125
+
This step configures the managed Virtual Machine Scale Set associated with the node type as zone-resilient, ensuring that any new VMs added to it will be deployed across availability zones (Zonal VMs). If the specified node type is primary, the resource provider will perform the migration of the Public IP along with a cluster FQDN DNS update, if needed, to become zone resilient. Use the `getazresiliencystatus` API to understand implication of this step.
126
126
127
127
* Use apiVersion 2022-02-01-preview or higher.
128
-
* Add the `zones` parameter set to `["1", "2", "3"]` to existing node types as show below:
128
+
* Add the `zones` parameter set to `["1", "2", "3"]` to existing node types:
129
129
130
130
```json
131
131
{
@@ -160,18 +160,18 @@ Requirements:
160
160
}
161
161
```
162
162
163
-
5) Scale Node types to add **Zonal** nodes and remove **Regional** nodes
163
+
4) Scale Node types to add **Zonal** nodes and remove **Regional** nodes
164
164
165
-
At this stage, the VMSS is marked as zone-resilient. Consequently, when scaling up, newly added nodes will be zonal, and when scaling down, regional nodes will be removed. This provides the flexibility to scale in any order that aligns with your capacity requirements by adjusting the `vmInstanceCount` property on the node types.
165
+
At this stage, the VMSS is marked as zone-resilient. So, when scaling up, newly added nodes will be zonal, and when scaling down, regional nodes will be removed. This approach provides the flexibility to scale in any order that aligns with your capacity requirements by adjusting the `vmInstanceCount` property on the node types.
166
166
167
-
For example, if the initial vmInstanceCount is set to 6 (indicating 6 regional nodes), you can perform 2 deployments:
167
+
For example, if the initial vmInstanceCount is set to 6 (indicating 6 regional nodes), you can perform 2 deployments:
168
168
- First deployment: Increase the vmInstanceCount to 12 to add 6 **Zonal** nodes.
169
169
- Second deployment: Decrease the vmInstanceCount to 6 to remove all **Regional** nodes.
170
170
171
-
Throughout the process, you can check the `getazresiliencystatus` API to retrieve the progress status, as illustrated below. The process is considered complete once each node type has a minimum of 6 zonal nodes and 0 regional nodes.
171
+
Throughout the process, you can check the `getazresiliencystatus` API to retrieve the progress status, as illustrated below. The process is considered complete once each node type has a minimum of 6 zonal nodes and 0 regional nodes.
172
172
173
-
```json
174
-
{
173
+
```json
174
+
{
175
175
"baseResourceStatus" :[
176
176
{
177
177
"resourceName": "sfmccluster1"
@@ -197,14 +197,14 @@ Throughout the process, you can check the `getazresiliencystatus` API to retriev
197
197
}
198
198
],
199
199
"isClusterZoneResilient": false
200
-
}
201
-
```
202
-
>[!NOTE]
203
-
> The scaling process for the primary node type will require additional time, as each addition or removal of a node will initiate a service fabric cluster upgrade.
200
+
}
201
+
```
202
+
>[!NOTE]
203
+
> The scaling process for the primary node type will require additional time, as each addition or removal of a node will initiate a service fabric cluster upgrade.
204
204
205
-
6) Mark the cluster resilient to zone failures
205
+
5) Mark the cluster resilient to zone failures
206
206
207
-
This step helps in future deployments, since it ensures all future deployments of node types span across availability zones and thus cluster remains tolerant to AZ failures. Set `zonalResiliency: true` in the cluster ARM template and do a deployment to mark cluster as zone resilient and ensure all new node type deployments span across availability zones. This will only be allowed if all node types have at least 6 zonal nodes and 0 regional nodes.
207
+
This step helps in future deployments, since it ensures all future deployments of node types span across availability zones and thus cluster remains tolerant to AZ failures. Set `zonalResiliency: true` in the cluster ARM template and do a deployment to mark cluster as zone resilient and ensure all new node type deployments span across availability zones. This update is only allowed if all node types have at least 6 zonal nodes and 0 regional nodes.
208
208
209
209
```json
210
210
{
@@ -215,9 +215,9 @@ Throughout the process, you can check the `getazresiliencystatus` API to retriev
215
215
```
216
216
You can also see the updated status in portal under Overview -> Properties similar to `Zonal resiliency True`, once complete.
217
217
218
-
7) Validate all the resources are zone resilient
218
+
6) Validate all the resources are zone resilient
219
219
220
-
To validate the Availability Zone resiliency state for the resources of the managed cluster use the following GET API call:
220
+
To validate the Availability Zone resiliency state for the resources of the managed cluster, use the following GET API call:
221
221
222
222
```http
223
223
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ServiceFabric/managedClusters/{clusterName}/getazresiliencystatus?api-version=2022-02-01-preview
@@ -252,14 +252,14 @@ Throughout the process, you can check the `getazresiliencystatus` API to retriev
252
252
"isClusterZoneResilient": true
253
253
}
254
254
```
255
-
If you run in to any problems reach out to support for assistance.
255
+
If you run in to any problems, reach out to support for assistance.
256
256
257
257
## Enable FastZonalUpdate on Service Fabric managed clusters (preview)
258
-
Service Fabric managed clusters support faster cluster and application upgrades by reducing the max upgrade domains per availability zone. The default configuration right now can have at most 15 UDs in multiple AZ nodetype. This huge number of UDs reduced the upgrade velocity. The new configuration reduces the max UDs, which results in faster updates, keeping the safety of the upgrades intact.
258
+
Service Fabric managed clusters support faster cluster and application upgrades by reducing the max upgrade domains per availability zone. The default configuration right now can have at most 15 upgrade domains (UDs) in multiple AZ nodetype. This huge number of UDs reduced the upgrade velocity. The new configuration reduces the max UDs, which results in faster updates, keeping the safety of the upgrades intact.
259
259
260
260
The update should be done via ARM template by setting the zonalUpdateMode property to “fast” and then modifying a node type attribute, such as adding a node and then removing the node to each nodetype (see required steps 2 and 3). The Service Fabric managed cluster resource apiVersion should be 2022-10-01-preview or later.
261
261
262
-
1. Modify the ARM template with the new property mentioned above.
262
+
1. Modify the ARM template with the new property zonalUpdateMode.
0 commit comments