Skip to content

Commit eaac784

Browse files
authored
Merge pull request #107582 from erikadoyle/release-sf-manageddisk-migration
How-to: Service Fabric VMSS upgrade (without cluster downtime)
2 parents 1f771cf + 4125bc1 commit eaac784

File tree

5 files changed

+373
-2
lines changed

5 files changed

+373
-2
lines changed
51.6 KB
Loading
31.1 KB
Loading
51.7 KB
Loading

articles/service-fabric/toc.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@
368368
href: service-fabric-get-started.md
369369
- name: Linux
370370
href: service-fabric-get-started-linux.md
371-
- name: Mac OS
371+
- name: macOS
372372
href: service-fabric-get-started-mac.md
373373
- name: Set up the Service Fabric CLI
374374
href: service-fabric-cli.md
@@ -676,7 +676,9 @@
676676
- name: Manage cluster certificates
677677
href: service-fabric-cluster-security-update-certs-azure.md
678678
- name: Remote connect to cluster node VM
679-
href: service-fabric-cluster-remote-connect-to-azure-cluster-node.md
679+
href: service-fabric-cluster-remote-connect-to-azure-cluster-node.md
680+
- name: Upgrade cluster nodes to use managed disks
681+
href: upgrade-managed-disks.md
680682
- name: Standalone clusters
681683
items:
682684
- name: Create
Lines changed: 369 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,369 @@
1+
---
2+
title: Upgrade cluster nodes to use Azure managed disks
3+
description: Here's how to upgrade an existing Service Fabric cluster to use Azure managed disks with little or no downtime of your cluster.
4+
ms.topic: how-to
5+
ms.date: 3/01/2020
6+
---
7+
# Upgrade cluster nodes to use Azure managed disks
8+
9+
[Azure managed disks](../virtual-machines/windows/managed-disks-overview.md) are the recommended disk storage offering for use with Azure virtual machines for persistent storage of data. You can improve the resiliency of your Service Fabric workloads by upgrading the virtual machine scale sets that underlie your node types to use managed disks. Here's how to upgrade an existing Service Fabric cluster to use Azure managed disks with little or no downtime of your cluster.
10+
11+
The general strategy for upgrading a Service Fabric cluster node to use managed disks is to:
12+
13+
1. Deploy an otherwise duplicate virtual machine scale set of that node type, but with the [managedDisk](https://docs.microsoft.com/azure/templates/microsoft.compute/2019-07-01/virtualmachinescalesets/virtualmachines#ManagedDiskParameters) object added to the `osDisk` section of the virtual machine scale set deployment template. The new scale set should bind to the same load balancer / IP as the original, so that your customers don't experience a service outage during the migration.
14+
15+
2. Once both the original and upgraded scale sets are running side by side, disable the original node instances one at a time so that the system services (or replicas of stateful services) migrate to the new scale set.
16+
17+
3. Verify the cluster and new nodes are healthy, then remove the original scale set and node state for the deleted nodes.
18+
19+
This article will walk you through the steps of upgrading the primary node type of an example cluster to use managed disks, while avoiding any cluster downtime (see note below). The initial state of the example test cluster consists of one node type of [Silver durability](service-fabric-cluster-capacity.md#the-durability-characteristics-of-the-cluster), backed by a single scale set with five nodes.
20+
21+
> [!CAUTION]
22+
> You will experience an outage with this procedure only if you have dependencies on the cluster DNS (such as when accessing [Service Fabric Explorer](service-fabric-visualizing-your-cluster.md)). Architectural [best practice for front-end services](https://docs.microsoft.com/azure/architecture/microservices/design/gateway) is to have some kind of [load balancer](https://docs.microsoft.com/azure/architecture/guide/technology-choices/load-balancing-overview) in front of your node types to make node swapping possible without an outage.
23+
24+
Here are the [templates and cmdlets](https://github.com/erikadoyle/service-fabric-scripts-and-templates/tree/managed-disks/templates/nodetype-upgrade-no-outage) for Azure Resource Manager that we'll use to complete the upgrade scenario. The template changes will be explained in [Deploy an upgraded scale set for the primary node type](#deploy-an-upgraded-scale-set-for-the-primary-node-type) below.
25+
26+
## Set up the test cluster
27+
28+
Let's set up the initial Service Fabric test cluster. First, [download](https://github.com/erikadoyle/service-fabric-scripts-and-templates/tree/managed-disks/templates/nodetype-upgrade-no-outage) the Azure resource manager sample templates that we'll use to complete this scenario.
29+
30+
Next, sign in to your Azure account.
31+
32+
```powershell
33+
# Sign in to your Azure account
34+
Login-AzAccount -SubscriptionId "<subscription ID>"
35+
```
36+
37+
The following commands will guide you through generating a new self-signed certificate and deploying the test cluster. If you already have a certificate you'd like to use, skip to [Use an existing certificate to deploy the cluster](#use-an-existing-certificate-to-deploy-the-cluster).
38+
39+
### Generate a self-signed certificate and deploy the cluster
40+
41+
First, assign the variables you'll need for Service Fabric cluster deployment. Adjust the values for `resourceGroupName`, `certSubjectName`, `parameterFilePath`, and `templateFilePath` for your specific account and environment:
42+
43+
```powershell
44+
# Assign deployment variables
45+
$resourceGroupName = "sftestupgradegroup"
46+
$certOutputFolder = "c:\certificates"
47+
$certPassword = "Password!1" | ConvertTo-SecureString -AsPlainText -Force
48+
$certSubjectName = "sftestupgrade.southcentralus.cloudapp.azure.com"
49+
$templateFilePath = "C:\Initial-1NodeType-UnmanagedDisks.json"
50+
$parameterFilePath = "C:\Initial-1NodeType-UnmanagedDisks.parameters.json"
51+
```
52+
53+
> [!NOTE]
54+
> Ensure that the `certOutputFolder` location exist on your local machine before running the command to deploy a new Service Fabric cluster.
55+
56+
Next open the [*Initial-1NodeType-UnmanagedDisks.parameters.json*](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Initial-1NodeType-UnmanagedDisks.parameters.json) file and adjust the values for `clusterName` and `dnsName` to correspond to the dynamic values you set in PowerShell and save your changes.
57+
58+
Then deploy the Service Fabric test cluster:
59+
60+
```powershell
61+
# Deploy the initial test cluster
62+
New-AzServiceFabricCluster `
63+
-ResourceGroupName $resourceGroupName `
64+
-CertificateOutputFolder $certOutputFolder `
65+
-CertificatePassword $certPassword `
66+
-CertificateSubjectName $certSubjectName `
67+
-TemplateFile $templateFilePath `
68+
-ParameterFile $parameterFilePath
69+
```
70+
71+
Once the deployment is complete, locate the *.pfx* file (`$certPfx`) on your local machine and import it to your certificate store:
72+
73+
```powershell
74+
cd c:\certificates
75+
$certPfx = ".\sftestupgradegroup20200312121003.pfx"
76+
77+
Import-PfxCertificate `
78+
-FilePath $certPfx `
79+
-CertStoreLocation Cert:\CurrentUser\My `
80+
-Password (ConvertTo-SecureString Password!1 -AsPlainText -Force)
81+
```
82+
83+
The operation will return the certificate thumbprint, which you'll use to [connect to the new cluster](#connect-to-the-new-cluster-and-check-health-status) and check its health status. (Skip the following section, which is an alternate approach to cluster deployment.)
84+
85+
### Use an existing certificate to deploy the cluster
86+
87+
You can also use an existing Azure Key Vault certificate to deploy the test cluster. To do this, you'll need to [obtain references to your Key Vault](#obtain-your-key-vault-references) and certificate thumbprint.
88+
89+
```powershell
90+
# Key Vault variables
91+
$certUrlValue = "https://sftestupgradegroup.vault.azure.net/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39"
92+
$sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup"
93+
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"
94+
```
95+
96+
Open the [*Initial-1NodeType-UnmanagedDisks.parameters.json*](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Initial-1NodeType-UnmanagedDisks.parameters.json) file and change the values for `clusterName` and `dnsName` to something unique.
97+
98+
Finally, designate a resource group name for the cluster and set the `templateFilePath` and `parameterFilePath` locations of your *Initial-1NodeType-UnmanagedDisks* files:
99+
100+
> [!NOTE]
101+
> The designated resource group must already exist and be located in the same region as your Key Vault.
102+
103+
```powershell
104+
# Deploy the new scale set (upgraded to use managed disks) into the primary node type.
105+
$resourceGroupName = "sftestupgradegroup"
106+
$templateFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.json"
107+
$parameterFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json"
108+
```
109+
110+
Finally, run the following command to deploy the initial test cluster:
111+
112+
```powershell
113+
New-AzResourceGroupDeployment `
114+
-ResourceGroupName $resourceGroupName `
115+
-TemplateFile $templateFilePath `
116+
-TemplateParameterFile $parameterFilePath `
117+
-CertificateThumbprint $thumb `
118+
-CertificateUrlValue $certUrlValue `
119+
-SourceVaultValue $sourceVaultValue `
120+
-Verbose
121+
```
122+
123+
### Connect to the new cluster and check health status
124+
125+
Connect to the cluster and ensure that all five of its nodes are healthy (replacing the `clusterName` and `thumb` variables for your cluster):
126+
127+
```powershell
128+
# Connect to the cluster
129+
$clusterName = "sftestupgrade.southcentralus.cloudapp.azure.com:19000"
130+
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"
131+
132+
Connect-ServiceFabricCluster `
133+
-ConnectionEndpoint $clusterName `
134+
-KeepAliveIntervalInSec 10 `
135+
-X509Credential `
136+
-ServerCertThumbprint $thumb `
137+
-FindType FindByThumbprint `
138+
-FindValue $thumb `
139+
-StoreLocation CurrentUser `
140+
-StoreName My
141+
142+
# Check cluster health
143+
Get-ServiceFabricClusterHealth
144+
```
145+
146+
With that, we're ready to begin the upgrade procedure.
147+
148+
## Deploy an upgraded scale set for the primary node type
149+
150+
In order to upgrade, or *vertically scale*, a node type, we'll need to deploy a copy of that node type's virtual machine scale set, which is otherwise identical to the original scale set (including reference to the same `nodeTypeRef`, `subnet`, and `loadBalancerBackendAddressPools`) except that it includes the desired upgrade/changes and its own separate subnet and inbound NAT address pool. Because we are upgrading a primary node type, the new scale set will be marked as primary (`isPrimary: true`), just like the original scale set. (For non-primary node type upgrades, simply omit this.)
151+
152+
For convenience, the required changes have already been made for you in the *Upgrade-1NodeType-2ScaleSets-ManagedDisks* [template](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Upgrade-1NodeType-2ScaleSets-ManagedDisks.json) and [parameters](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json) files.
153+
154+
The following sections will explain the template changes in detail. If you prefer, you can skip the explanation and continue on to [the next step of the upgrade procedure](#obtain-your-key-vault-references).
155+
156+
### Update the cluster template with the upgraded scale set
157+
158+
Here are the section-by-section modifications of the original cluster deployment template for adding an upgraded scale set for the primary node type.
159+
160+
#### Parameters
161+
162+
Add parameters for the instance name, count, and size of the new scale set. Note that `vmNodeType1Name` is unique to the new scale set, while the count and size values are identical to the original scale set.
163+
164+
**Template file**
165+
166+
```json
167+
"vmNodeType1Name": {
168+
"type": "string",
169+
"defaultValue": "NTvm2",
170+
"maxLength": 9
171+
},
172+
"nt1InstanceCount": {
173+
"type": "int",
174+
"defaultValue": 5,
175+
"metadata": {
176+
"description": "Instance count for node type"
177+
}
178+
},
179+
"vmNodeType1Size": {
180+
"type": "string",
181+
"defaultValue": "Standard_D2_v2"
182+
},
183+
```
184+
185+
**Parameters file**
186+
187+
```json
188+
"vmNodeType1Name": {
189+
"value": "NTvm2"
190+
},
191+
"nt1InstanceCount": {
192+
"value": 5
193+
},
194+
"vmNodeType1Size": {
195+
"value": "Standard_D2_v2"
196+
}
197+
```
198+
199+
### Variables
200+
201+
In the deployment template `variables` section, add an entry for the inbound NAT address pool of the new scale set.
202+
203+
**Template file**
204+
205+
```json
206+
"lbNatPoolID1": "[concat(variables('lbID0'),'/inboundNatPools/LoadBalancerBEAddressNatPool1')]",
207+
```
208+
209+
### Resources
210+
211+
In the deployment template *resources* section, add the new virtual machine scale set, keeping in mind these things:
212+
213+
* The new scale set references the same node type as the original:
214+
215+
```json
216+
"nodeTypeRef": "[parameters('vmNodeType0Name')]",
217+
```
218+
219+
* The new scale set references the same load balancer backend address and subnet (but uses a different load balancer inbound NAT pool):
220+
221+
```json
222+
"loadBalancerBackendAddressPools": [
223+
{
224+
"id": "[variables('lbPoolID0')]"
225+
}
226+
],
227+
"loadBalancerInboundNatPools": [
228+
{
229+
"id": "[variables('lbNatPoolID1')]"
230+
}
231+
],
232+
"subnet": {
233+
"id": "[variables('subnet0Ref')]"
234+
}
235+
```
236+
237+
* Like the original scale set, the new scale set is marked as the primary node type. (When upgrading non-primary node types, omit this change.)
238+
239+
```json
240+
"isPrimary": true,
241+
```
242+
243+
* Unlike the original scale set, the new scale set is upgraded to use managed disks.
244+
245+
```json
246+
"managedDisk": {
247+
"storageAccountType": "[parameters('storageAccountType')]"
248+
}
249+
```
250+
251+
Once you've implemented all the changes in your template and parameters files, proceed to the next section to acquire your Key Vault references and deploy the updates to your cluster.
252+
253+
### Obtain your Key Vault references
254+
255+
To deploy the updated configuration, you'll first to obtain several references to your cluster certificate stored in your Key Vault. The easiest way to find these values is through Azure portal. You'll need:
256+
257+
* **The Key Vault URL of your cluster certificate.** From your Key Vault in Azure portal, select **Certificates** > *Your desired certificate* > **Secret Identifier**:
258+
259+
```powershell
260+
$certUrlValue="https://sftestupgradegroup.vault.azure.net/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39"
261+
```
262+
263+
* **The thumbprint of your cluster certificate.** (You probably already have this if you [connected to the initial cluster](#connect-to-the-new-cluster-and-check-health-status) to check its health status.) From the same certificate blade (**Certificates** > *Your desired certificate*) in Azure portal, copy **X.509 SHA-1 Thumbprint (in hex)**:
264+
265+
```powershell
266+
$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70"
267+
```
268+
269+
* **The Resource ID of your Key Vault.** From your Key Vault in Azure portal, select **Properties** > **Resource ID**:
270+
271+
```powershell
272+
$sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup"
273+
```
274+
275+
### Deploy the updated template
276+
277+
Adjust the `parameterFilePath` and `templateFilePath` as needed and then run the following command:
278+
279+
```powershell
280+
# Deploy the new scale set (upgraded to use managed disks) into the primary node type.
281+
$templateFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.json"
282+
$parameterFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json"
283+
284+
New-AzResourceGroupDeployment `
285+
-ResourceGroupName $resourceGroupName `
286+
-TemplateFile $templateFilePath `
287+
-TemplateParameterFile $parameterFilePath `
288+
-CertificateThumbprint $thumb `
289+
-CertificateUrlValue $certUrlValue `
290+
-SourceVaultValue $sourceVaultValue `
291+
-Verbose
292+
```
293+
294+
When the deployment completes, check the cluster health again and ensure all ten nodes (five on the original and five on the new scale set) are healthy.
295+
296+
```powershell
297+
Get-ServiceFabricClusterHealth
298+
```
299+
300+
## Migrate seed nodes to the new scale set
301+
302+
We're now ready to start disabling the nodes of the original scale set. As these nodes become disabled, the system services and seed nodes migrate to the VMs of the new scale set because it is also marked as the primary node type.
303+
304+
```powershell
305+
# Disable the nodes in the original scale set.
306+
$nodeNames = @("_NTvm1_0","_NTvm1_1","_NTvm1_2","_NTvm1_3","_NTvm1_4")
307+
308+
Write-Host "Disabling nodes..."
309+
foreach($name in $nodeNames){
310+
Disable-ServiceFabricNode -NodeName $name -Intent RemoveNode -Force
311+
}
312+
```
313+
314+
Use Service Fabric Explorer to monitor the migration of seed nodes to the new scale set and the progression of nodes in the original scale set from *Disabling* to *Disabled* status.
315+
316+
![Service Fabric Explorer showing status of disabled nodes](./media/upgrade-managed-disks/service-fabric-explorer-node-status.png)
317+
318+
> [!NOTE]
319+
> It may take some time to complete the disabling operation across all the nodes of the original scale set. To guarantee data consistency, only one seed node can change at a time. Each seed node change requires a cluster update; thus replacing a seed node requires two cluster upgrades (one each for node addition and removal). Upgrading the five seed nodes in this sample scenario will result in ten cluster upgrades.
320+
321+
## Remove the original scale set
322+
323+
Once the disabling operation is complete, remove the scale set.
324+
325+
```powershell
326+
# Remove the original scale set
327+
$scaleSetName = "NTvm1"
328+
329+
Remove-AzVmss `
330+
-ResourceGroupName $resourceGroupName `
331+
-VMScaleSetName $scaleSetName `
332+
-Force
333+
334+
Write-Host "Removed scale set $scaleSetName"
335+
```
336+
337+
In Service Fabric Explorer, the removed nodes (and thus the *Cluster Health State*) will now appear in *Error* state.
338+
339+
![Service Fabric Explorer showing disabled nodes in error state](./media/upgrade-managed-disks/service-fabric-explorer-disabled-nodes-error-state.png)
340+
341+
Remove the obsolete nodes from the Service Fabric cluster to restore the Cluster Health State to *OK*.
342+
343+
```powershell
344+
# Remove node states for the deleted scale set
345+
foreach($name in $nodeNames){
346+
Remove-ServiceFabricNodeState -NodeName $name -TimeoutSec 300 -Force
347+
Write-Host "Removed node state for node $name"
348+
}
349+
```
350+
351+
![Service Fabric Explorer with down nodes in error state removed](./media/upgrade-managed-disks/service-fabric-explorer-healthy-cluster.png)
352+
353+
## Next steps
354+
355+
In this walkthrough, you learned how to upgrade the virtual machine scale sets of a Service Fabric cluster to use managed disks while avoiding service outages during the process. For more info on related topics check out the following resources.
356+
357+
Learn how to:
358+
359+
* [Scale up a Service Fabric cluster primary node type](service-fabric-scale-up-node-type.md)
360+
361+
* [Convert a scale set template to use managed disks](../virtual-machine-scale-sets/virtual-machine-scale-sets-convert-template-to-md.md)
362+
363+
* [Remove a Service Fabric node type](service-fabric-how-to-remove-node-type.md)
364+
365+
See also:
366+
367+
* [Sample: Upgrade cluster nodes to use Azure managed disks](https://github.com/erikadoyle/service-fabric-scripts-and-templates/tree/managed-disks/templates/nodetype-upgrade-no-outage)
368+
369+
* [Vertical scaling considerations](service-fabric-best-practices-capacity-scaling.md#vertical-scaling-considerations)

0 commit comments

Comments
 (0)