|
| 1 | +--- |
| 2 | +title: Upgrade cluster nodes to use Azure managed disks |
| 3 | +description: Here's how to upgrade an existing Service Fabric cluster to use Azure managed disks with little or no downtime of your cluster. |
| 4 | +ms.topic: how-to |
| 5 | +ms.date: 3/01/2020 |
| 6 | +--- |
| 7 | +# Upgrade cluster nodes to use Azure managed disks |
| 8 | + |
| 9 | +[Azure managed disks](../virtual-machines/windows/managed-disks-overview.md) are the recommended disk storage offering for use with Azure virtual machines for persistent storage of data. You can improve the resiliency of your Service Fabric workloads by upgrading the virtual machine scale sets that underlie your node types to use managed disks. Here's how to upgrade an existing Service Fabric cluster to use Azure managed disks with little or no downtime of your cluster. |
| 10 | + |
| 11 | +The general strategy for upgrading a Service Fabric cluster node to use managed disks is to: |
| 12 | + |
| 13 | +1. Deploy an otherwise duplicate virtual machine scale set of that node type, but with the [managedDisk](https://docs.microsoft.com/azure/templates/microsoft.compute/2019-07-01/virtualmachinescalesets/virtualmachines#ManagedDiskParameters) object added to the `osDisk` section of the virtual machine scale set deployment template. The new scale set should bind to the same load balancer / IP as the original, so that your customers don't experience a service outage during the migration. |
| 14 | + |
| 15 | +2. Once both the original and upgraded scale sets are running side by side, disable the original node instances one at a time so that the system services (or replicas of stateful services) migrate to the new scale set. |
| 16 | + |
| 17 | +3. Verify the cluster and new nodes are healthy, then remove the original scale set and node state for the deleted nodes. |
| 18 | + |
| 19 | +This article will walk you through the steps of upgrading the primary node type of an example cluster to use managed disks, while avoiding any cluster downtime (see note below). The initial state of the example test cluster consists of one node type of [Silver durability](service-fabric-cluster-capacity.md#the-durability-characteristics-of-the-cluster), backed by a single scale set with five nodes. |
| 20 | + |
| 21 | +> [!CAUTION] |
| 22 | +> You will experience an outage with this procedure only if you have dependencies on the cluster DNS (such as when accessing [Service Fabric Explorer](service-fabric-visualizing-your-cluster.md)). Architectural [best practice for front-end services](https://docs.microsoft.com/azure/architecture/microservices/design/gateway) is to have some kind of [load balancer](https://docs.microsoft.com/azure/architecture/guide/technology-choices/load-balancing-overview) in front of your node types to make node swapping possible without an outage. |
| 23 | +
|
| 24 | +Here are the [templates and cmdlets](https://github.com/erikadoyle/service-fabric-scripts-and-templates/tree/managed-disks/templates/nodetype-upgrade-no-outage) for Azure Resource Manager that we'll use to complete the upgrade scenario. The template changes will be explained in [Deploy an upgraded scale set for the primary node type](#deploy-an-upgraded-scale-set-for-the-primary-node-type) below. |
| 25 | + |
| 26 | +## Set up the test cluster |
| 27 | + |
| 28 | +Let's set up the initial Service Fabric test cluster. First, [download](https://github.com/erikadoyle/service-fabric-scripts-and-templates/tree/managed-disks/templates/nodetype-upgrade-no-outage) the Azure resource manager sample templates that we'll use to complete this scenario. |
| 29 | + |
| 30 | +Next, sign in to your Azure account. |
| 31 | + |
| 32 | +```powershell |
| 33 | +# Sign in to your Azure account |
| 34 | +Login-AzAccount -SubscriptionId "<subscription ID>" |
| 35 | +``` |
| 36 | + |
| 37 | +The following commands will guide you through generating a new self-signed certificate and deploying the test cluster. If you already have a certificate you'd like to use, skip to [Use an existing certificate to deploy the cluster](#use-an-existing-certificate-to-deploy-the-cluster). |
| 38 | + |
| 39 | +### Generate a self-signed certificate and deploy the cluster |
| 40 | + |
| 41 | +First, assign the variables you'll need for Service Fabric cluster deployment. Adjust the values for `resourceGroupName`, `certSubjectName`, `parameterFilePath`, and `templateFilePath` for your specific account and environment: |
| 42 | + |
| 43 | +```powershell |
| 44 | +# Assign deployment variables |
| 45 | +$resourceGroupName = "sftestupgradegroup" |
| 46 | +$certOutputFolder = "c:\certificates" |
| 47 | +$certPassword = "Password!1" | ConvertTo-SecureString -AsPlainText -Force |
| 48 | +$certSubjectName = "sftestupgrade.southcentralus.cloudapp.azure.com" |
| 49 | +$templateFilePath = "C:\Initial-1NodeType-UnmanagedDisks.json" |
| 50 | +$parameterFilePath = "C:\Initial-1NodeType-UnmanagedDisks.parameters.json" |
| 51 | +``` |
| 52 | + |
| 53 | +> [!NOTE] |
| 54 | +> Ensure that the `certOutputFolder` location exist on your local machine before running the command to deploy a new Service Fabric cluster. |
| 55 | +
|
| 56 | +Next open the [*Initial-1NodeType-UnmanagedDisks.parameters.json*](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Initial-1NodeType-UnmanagedDisks.parameters.json) file and adjust the values for `clusterName` and `dnsName` to correspond to the dynamic values you set in PowerShell and save your changes. |
| 57 | + |
| 58 | +Then deploy the Service Fabric test cluster: |
| 59 | + |
| 60 | +```powershell |
| 61 | +# Deploy the initial test cluster |
| 62 | +New-AzServiceFabricCluster ` |
| 63 | + -ResourceGroupName $resourceGroupName ` |
| 64 | + -CertificateOutputFolder $certOutputFolder ` |
| 65 | + -CertificatePassword $certPassword ` |
| 66 | + -CertificateSubjectName $certSubjectName ` |
| 67 | + -TemplateFile $templateFilePath ` |
| 68 | + -ParameterFile $parameterFilePath |
| 69 | +``` |
| 70 | + |
| 71 | +Once the deployment is complete, locate the *.pfx* file (`$certPfx`) on your local machine and import it to your certificate store: |
| 72 | + |
| 73 | +```powershell |
| 74 | +cd c:\certificates |
| 75 | +$certPfx = ".\sftestupgradegroup20200312121003.pfx" |
| 76 | +
|
| 77 | +Import-PfxCertificate ` |
| 78 | + -FilePath $certPfx ` |
| 79 | + -CertStoreLocation Cert:\CurrentUser\My ` |
| 80 | + -Password (ConvertTo-SecureString Password!1 -AsPlainText -Force) |
| 81 | +``` |
| 82 | + |
| 83 | +The operation will return the certificate thumbprint, which you'll use to [connect to the new cluster](#connect-to-the-new-cluster-and-check-health-status) and check its health status. (Skip the following section, which is an alternate approach to cluster deployment.) |
| 84 | + |
| 85 | +### Use an existing certificate to deploy the cluster |
| 86 | + |
| 87 | +You can also use an existing Azure Key Vault certificate to deploy the test cluster. To do this, you'll need to [obtain references to your Key Vault](#obtain-your-key-vault-references) and certificate thumbprint. |
| 88 | + |
| 89 | +```powershell |
| 90 | +# Key Vault variables |
| 91 | +$certUrlValue = "https://sftestupgradegroup.vault.azure.net/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39" |
| 92 | +$sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup" |
| 93 | +$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70" |
| 94 | +``` |
| 95 | + |
| 96 | +Open the [*Initial-1NodeType-UnmanagedDisks.parameters.json*](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Initial-1NodeType-UnmanagedDisks.parameters.json) file and change the values for `clusterName` and `dnsName` to something unique. |
| 97 | + |
| 98 | +Finally, designate a resource group name for the cluster and set the `templateFilePath` and `parameterFilePath` locations of your *Initial-1NodeType-UnmanagedDisks* files: |
| 99 | + |
| 100 | +> [!NOTE] |
| 101 | +> The designated resource group must already exist and be located in the same region as your Key Vault. |
| 102 | +
|
| 103 | +```powershell |
| 104 | +# Deploy the new scale set (upgraded to use managed disks) into the primary node type. |
| 105 | +$resourceGroupName = "sftestupgradegroup" |
| 106 | +$templateFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.json" |
| 107 | +$parameterFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json" |
| 108 | +``` |
| 109 | + |
| 110 | +Finally, run the following command to deploy the initial test cluster: |
| 111 | + |
| 112 | +```powershell |
| 113 | +New-AzResourceGroupDeployment ` |
| 114 | + -ResourceGroupName $resourceGroupName ` |
| 115 | + -TemplateFile $templateFilePath ` |
| 116 | + -TemplateParameterFile $parameterFilePath ` |
| 117 | + -CertificateThumbprint $thumb ` |
| 118 | + -CertificateUrlValue $certUrlValue ` |
| 119 | + -SourceVaultValue $sourceVaultValue ` |
| 120 | + -Verbose |
| 121 | +``` |
| 122 | + |
| 123 | +### Connect to the new cluster and check health status |
| 124 | + |
| 125 | +Connect to the cluster and ensure that all five of its nodes are healthy (replacing the `clusterName` and `thumb` variables for your cluster): |
| 126 | + |
| 127 | +```powershell |
| 128 | +# Connect to the cluster |
| 129 | +$clusterName = "sftestupgrade.southcentralus.cloudapp.azure.com:19000" |
| 130 | +$thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70" |
| 131 | +
|
| 132 | +Connect-ServiceFabricCluster ` |
| 133 | + -ConnectionEndpoint $clusterName ` |
| 134 | + -KeepAliveIntervalInSec 10 ` |
| 135 | + -X509Credential ` |
| 136 | + -ServerCertThumbprint $thumb ` |
| 137 | + -FindType FindByThumbprint ` |
| 138 | + -FindValue $thumb ` |
| 139 | + -StoreLocation CurrentUser ` |
| 140 | + -StoreName My |
| 141 | +
|
| 142 | +# Check cluster health |
| 143 | +Get-ServiceFabricClusterHealth |
| 144 | +``` |
| 145 | + |
| 146 | +With that, we're ready to begin the upgrade procedure. |
| 147 | + |
| 148 | +## Deploy an upgraded scale set for the primary node type |
| 149 | + |
| 150 | +In order to upgrade, or *vertically scale*, a node type, we'll need to deploy a copy of that node type's virtual machine scale set, which is otherwise identical to the original scale set (including reference to the same `nodeTypeRef`, `subnet`, and `loadBalancerBackendAddressPools`) except that it includes the desired upgrade/changes and its own separate subnet and inbound NAT address pool. Because we are upgrading a primary node type, the new scale set will be marked as primary (`isPrimary: true`), just like the original scale set. (For non-primary node type upgrades, simply omit this.) |
| 151 | + |
| 152 | +For convenience, the required changes have already been made for you in the *Upgrade-1NodeType-2ScaleSets-ManagedDisks* [template](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Upgrade-1NodeType-2ScaleSets-ManagedDisks.json) and [parameters](https://github.com/erikadoyle/service-fabric-scripts-and-templates/blob/managed-disks/templates/nodetype-upgrade-no-outage/Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json) files. |
| 153 | + |
| 154 | +The following sections will explain the template changes in detail. If you prefer, you can skip the explanation and continue on to [the next step of the upgrade procedure](#obtain-your-key-vault-references). |
| 155 | + |
| 156 | +### Update the cluster template with the upgraded scale set |
| 157 | + |
| 158 | +Here are the section-by-section modifications of the original cluster deployment template for adding an upgraded scale set for the primary node type. |
| 159 | + |
| 160 | +#### Parameters |
| 161 | + |
| 162 | +Add parameters for the instance name, count, and size of the new scale set. Note that `vmNodeType1Name` is unique to the new scale set, while the count and size values are identical to the original scale set. |
| 163 | + |
| 164 | +**Template file** |
| 165 | + |
| 166 | +```json |
| 167 | +"vmNodeType1Name": { |
| 168 | + "type": "string", |
| 169 | + "defaultValue": "NTvm2", |
| 170 | + "maxLength": 9 |
| 171 | +}, |
| 172 | +"nt1InstanceCount": { |
| 173 | + "type": "int", |
| 174 | + "defaultValue": 5, |
| 175 | + "metadata": { |
| 176 | + "description": "Instance count for node type" |
| 177 | + } |
| 178 | +}, |
| 179 | +"vmNodeType1Size": { |
| 180 | + "type": "string", |
| 181 | + "defaultValue": "Standard_D2_v2" |
| 182 | +}, |
| 183 | +``` |
| 184 | + |
| 185 | +**Parameters file** |
| 186 | + |
| 187 | +```json |
| 188 | +"vmNodeType1Name": { |
| 189 | + "value": "NTvm2" |
| 190 | +}, |
| 191 | +"nt1InstanceCount": { |
| 192 | + "value": 5 |
| 193 | +}, |
| 194 | +"vmNodeType1Size": { |
| 195 | + "value": "Standard_D2_v2" |
| 196 | +} |
| 197 | +``` |
| 198 | + |
| 199 | +### Variables |
| 200 | + |
| 201 | +In the deployment template `variables` section, add an entry for the inbound NAT address pool of the new scale set. |
| 202 | + |
| 203 | +**Template file** |
| 204 | + |
| 205 | +```json |
| 206 | +"lbNatPoolID1": "[concat(variables('lbID0'),'/inboundNatPools/LoadBalancerBEAddressNatPool1')]", |
| 207 | +``` |
| 208 | + |
| 209 | +### Resources |
| 210 | + |
| 211 | +In the deployment template *resources* section, add the new virtual machine scale set, keeping in mind these things: |
| 212 | + |
| 213 | +* The new scale set references the same node type as the original: |
| 214 | + |
| 215 | + ```json |
| 216 | + "nodeTypeRef": "[parameters('vmNodeType0Name')]", |
| 217 | + ``` |
| 218 | + |
| 219 | +* The new scale set references the same load balancer backend address and subnet (but uses a different load balancer inbound NAT pool): |
| 220 | + |
| 221 | + ```json |
| 222 | + "loadBalancerBackendAddressPools": [ |
| 223 | + { |
| 224 | + "id": "[variables('lbPoolID0')]" |
| 225 | + } |
| 226 | + ], |
| 227 | + "loadBalancerInboundNatPools": [ |
| 228 | + { |
| 229 | + "id": "[variables('lbNatPoolID1')]" |
| 230 | + } |
| 231 | + ], |
| 232 | + "subnet": { |
| 233 | + "id": "[variables('subnet0Ref')]" |
| 234 | + } |
| 235 | + ``` |
| 236 | + |
| 237 | +* Like the original scale set, the new scale set is marked as the primary node type. (When upgrading non-primary node types, omit this change.) |
| 238 | + |
| 239 | + ```json |
| 240 | + "isPrimary": true, |
| 241 | + ``` |
| 242 | + |
| 243 | +* Unlike the original scale set, the new scale set is upgraded to use managed disks. |
| 244 | + |
| 245 | + ```json |
| 246 | + "managedDisk": { |
| 247 | + "storageAccountType": "[parameters('storageAccountType')]" |
| 248 | + } |
| 249 | + ``` |
| 250 | + |
| 251 | +Once you've implemented all the changes in your template and parameters files, proceed to the next section to acquire your Key Vault references and deploy the updates to your cluster. |
| 252 | + |
| 253 | +### Obtain your Key Vault references |
| 254 | + |
| 255 | +To deploy the updated configuration, you'll first to obtain several references to your cluster certificate stored in your Key Vault. The easiest way to find these values is through Azure portal. You'll need: |
| 256 | + |
| 257 | +* **The Key Vault URL of your cluster certificate.** From your Key Vault in Azure portal, select **Certificates** > *Your desired certificate* > **Secret Identifier**: |
| 258 | + |
| 259 | + ```powershell |
| 260 | + $certUrlValue="https://sftestupgradegroup.vault.azure.net/secrets/sftestupgradegroup20200309235308/dac0e7b7f9d4414984ccaa72bfb2ea39" |
| 261 | + ``` |
| 262 | + |
| 263 | +* **The thumbprint of your cluster certificate.** (You probably already have this if you [connected to the initial cluster](#connect-to-the-new-cluster-and-check-health-status) to check its health status.) From the same certificate blade (**Certificates** > *Your desired certificate*) in Azure portal, copy **X.509 SHA-1 Thumbprint (in hex)**: |
| 264 | + |
| 265 | + ```powershell |
| 266 | + $thumb = "BB796AA33BD9767E7DA27FE5182CF8FDEE714A70" |
| 267 | + ``` |
| 268 | + |
| 269 | +* **The Resource ID of your Key Vault.** From your Key Vault in Azure portal, select **Properties** > **Resource ID**: |
| 270 | + |
| 271 | + ```powershell |
| 272 | + $sourceVaultValue = "/subscriptions/########-####-####-####-############/resourceGroups/sftestupgradegroup/providers/Microsoft.KeyVault/vaults/sftestupgradegroup" |
| 273 | + ``` |
| 274 | + |
| 275 | +### Deploy the updated template |
| 276 | + |
| 277 | +Adjust the `parameterFilePath` and `templateFilePath` as needed and then run the following command: |
| 278 | + |
| 279 | +```powershell |
| 280 | +# Deploy the new scale set (upgraded to use managed disks) into the primary node type. |
| 281 | +$templateFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.json" |
| 282 | +$parameterFilePath = "C:\Upgrade-1NodeType-2ScaleSets-ManagedDisks.parameters.json" |
| 283 | + |
| 284 | +New-AzResourceGroupDeployment ` |
| 285 | + -ResourceGroupName $resourceGroupName ` |
| 286 | + -TemplateFile $templateFilePath ` |
| 287 | + -TemplateParameterFile $parameterFilePath ` |
| 288 | + -CertificateThumbprint $thumb ` |
| 289 | + -CertificateUrlValue $certUrlValue ` |
| 290 | + -SourceVaultValue $sourceVaultValue ` |
| 291 | + -Verbose |
| 292 | +``` |
| 293 | + |
| 294 | +When the deployment completes, check the cluster health again and ensure all ten nodes (five on the original and five on the new scale set) are healthy. |
| 295 | + |
| 296 | +```powershell |
| 297 | +Get-ServiceFabricClusterHealth |
| 298 | +``` |
| 299 | + |
| 300 | +## Migrate seed nodes to the new scale set |
| 301 | + |
| 302 | +We're now ready to start disabling the nodes of the original scale set. As these nodes become disabled, the system services and seed nodes migrate to the VMs of the new scale set because it is also marked as the primary node type. |
| 303 | + |
| 304 | +```powershell |
| 305 | +# Disable the nodes in the original scale set. |
| 306 | +$nodeNames = @("_NTvm1_0","_NTvm1_1","_NTvm1_2","_NTvm1_3","_NTvm1_4") |
| 307 | +
|
| 308 | +Write-Host "Disabling nodes..." |
| 309 | +foreach($name in $nodeNames){ |
| 310 | + Disable-ServiceFabricNode -NodeName $name -Intent RemoveNode -Force |
| 311 | +} |
| 312 | +``` |
| 313 | + |
| 314 | +Use Service Fabric Explorer to monitor the migration of seed nodes to the new scale set and the progression of nodes in the original scale set from *Disabling* to *Disabled* status. |
| 315 | + |
| 316 | + |
| 317 | + |
| 318 | +> [!NOTE] |
| 319 | +> It may take some time to complete the disabling operation across all the nodes of the original scale set. To guarantee data consistency, only one seed node can change at a time. Each seed node change requires a cluster update; thus replacing a seed node requires two cluster upgrades (one each for node addition and removal). Upgrading the five seed nodes in this sample scenario will result in ten cluster upgrades. |
| 320 | +
|
| 321 | +## Remove the original scale set |
| 322 | + |
| 323 | +Once the disabling operation is complete, remove the scale set. |
| 324 | + |
| 325 | +```powershell |
| 326 | +# Remove the original scale set |
| 327 | +$scaleSetName = "NTvm1" |
| 328 | +
|
| 329 | +Remove-AzVmss ` |
| 330 | + -ResourceGroupName $resourceGroupName ` |
| 331 | + -VMScaleSetName $scaleSetName ` |
| 332 | + -Force |
| 333 | +
|
| 334 | +Write-Host "Removed scale set $scaleSetName" |
| 335 | +``` |
| 336 | + |
| 337 | +In Service Fabric Explorer, the removed nodes (and thus the *Cluster Health State*) will now appear in *Error* state. |
| 338 | + |
| 339 | + |
| 340 | + |
| 341 | +Remove the obsolete nodes from the Service Fabric cluster to restore the Cluster Health State to *OK*. |
| 342 | + |
| 343 | +```powershell |
| 344 | +# Remove node states for the deleted scale set |
| 345 | +foreach($name in $nodeNames){ |
| 346 | + Remove-ServiceFabricNodeState -NodeName $name -TimeoutSec 300 -Force |
| 347 | + Write-Host "Removed node state for node $name" |
| 348 | +} |
| 349 | +``` |
| 350 | + |
| 351 | + |
| 352 | + |
| 353 | +## Next steps |
| 354 | + |
| 355 | +In this walkthrough, you learned how to upgrade the virtual machine scale sets of a Service Fabric cluster to use managed disks while avoiding service outages during the process. For more info on related topics check out the following resources. |
| 356 | + |
| 357 | +Learn how to: |
| 358 | + |
| 359 | +* [Scale up a Service Fabric cluster primary node type](service-fabric-scale-up-node-type.md) |
| 360 | + |
| 361 | +* [Convert a scale set template to use managed disks](../virtual-machine-scale-sets/virtual-machine-scale-sets-convert-template-to-md.md) |
| 362 | + |
| 363 | +* [Remove a Service Fabric node type](service-fabric-how-to-remove-node-type.md) |
| 364 | + |
| 365 | +See also: |
| 366 | + |
| 367 | +* [Sample: Upgrade cluster nodes to use Azure managed disks](https://github.com/erikadoyle/service-fabric-scripts-and-templates/tree/managed-disks/templates/nodetype-upgrade-no-outage) |
| 368 | + |
| 369 | +* [Vertical scaling considerations](service-fabric-best-practices-capacity-scaling.md#vertical-scaling-considerations) |
0 commit comments