Skip to content

Commit 49a15b2

Browse files
authored
Merge pull request #41562 from rwike77/scaleclusterup
Scale cluster up
2 parents 0fa3757 + 73a299a commit 49a15b2

File tree

2 files changed

+163
-2
lines changed

2 files changed

+163
-2
lines changed
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
---
2+
title: Upgrade an Azure Service Fabric cluster VMs SKU or OS | Microsoft Docs
3+
description: Learn how to upgrade the virtual machines in a Service Fabric cluster's primary nodetype.
4+
services: service-fabric
5+
documentationcenter: .net
6+
author: rwike77
7+
manager: timlt
8+
editor: ''
9+
10+
ms.assetid: 5441e7e0-d842-4398-b060-8c9d34b07c48
11+
ms.service: service-fabric
12+
ms.devlang: dotnet
13+
ms.topic: article
14+
ms.tgt_pltfrm: NA
15+
ms.workload: NA
16+
ms.date: 05/21/2018
17+
ms.author: ryanwi
18+
19+
---
20+
# Upgrade the primary node type VMs of a Service Fabric cluster
21+
This article describes how to upgrade the primary node type virtual machines of a Service Fabric cluster running in Azure. A Service Fabric cluster is a network-connected set of virtual or physical machines into which your microservices are deployed and managed. A machine or VM that's part of a cluster is called a node. Virtual machine scale sets are an Azure compute resource that you use to deploy and manage a collection of virtual machines as a set. Every node type that is defined in an Azure cluster is [set up as a separate scale set](service-fabric-cluster-nodetypes.md). Each node type can then be managed separately. After creating a Service Fabric cluster, you can scale a cluster node type vertically (change the resources of the nodes) or upgrade the operating system of the node type VMs. You can scale the cluster at any time, even when workloads are running on the cluster. As the cluster scales, your applications automatically scale as well.
22+
23+
> [!WARNING]
24+
> We recommend that you do not change the VM SKU of a scale set/node type unless it is running at [Silver durability or greater](service-fabric-cluster-capacity.md#the-durability-characteristics-of-the-cluster). Changing VM SKU Size is a data-destructive in-place infrastructure operation. Without some ability to delay or monitor this change, it is possible that the operation can cause data loss for stateful services or cause other unforeseen operational issues, even for stateless workloads.
25+
>
26+
27+
## Upgrade the size and operating system of the primary node type VMs
28+
Here is the process for updating the VM size and operating system of the primary node type VMs. After the upgrade, the primary node type VMs are size Standard D4_V2 and running Windows Server 2016 Datacenter with Containers.
29+
30+
> [!WARNING]
31+
> Before attempting this procedure on a production cluster, we recommend that you study the sample templates and verify the process against a test cluster. The cluster is also unavailable for a time.
32+
33+
1. Deploy the initial cluster with two node types and two scale sets (one scale set per node type) using these sample [template](https://github.com/Azure/service-fabric-scripts-and-templates/blob/master/templates/nodetype-upgrade/Deploy-2NodeTypes-2ScaleSets.json) and [parameters](https://github.com/Azure/service-fabric-scripts-and-templates/blob/master/templates/nodetype-upgrade/Deploy-2NodeTypes-2ScaleSets.parameters.json) files. Both scale sets are size Standard D2_V2 and running Windows Server 2012 R2 Datacenter. Wait for the cluster to complete the baseline upgrade.
34+
2. Optional- deploy a stateful sample to the cluster.
35+
3. After deciding to upgrade the primary node type VMs, add a new scale set to the primary node type using these sample [template](https://github.com/Azure/service-fabric-scripts-and-templates/blob/master/templates/nodetype-upgrade/Deploy-2NodeTypes-3ScaleSets.json) and [parameters](https://github.com/Azure/service-fabric-scripts-and-templates/blob/master/templates/nodetype-upgrade/Deploy-2NodeTypes-3ScaleSets.parameters.json) files so the primary node type now has two scale sets. System services and user applications are able to migrate between VMs in the two different scale sets. The new scale set VMs are size Standard D4_V2 and run Windows Server 2016 Datacenter with Containers. A new load balancer and public IP address are also added with the new scale set.
36+
To find the new scale set in the template, search for the "Microsoft.Compute/virtualMachineScaleSets" resource named by the *vmNodeType2Name* parameter. The new scale set is added to the primary node type using the properties->virtualMachineProfile->extensionProfile->extensions->properties->settings->nodeTypeRef setting.
37+
4. Check the cluster health and verify all the nodes are healthy.
38+
5. Disable the nodes in the old scale set of the primary node type with the intent to remove node. You can disable all at once and the operations are queued. Wait until all nodes are disabled, which may take some time. As the older nodes in the node type are disabled, the system services and seed nodes migrate to the VMs of the new scale set in the primary node type.
39+
6. Remove the older scale set from the primary node type.
40+
7. Remove the load balancer associated with the old scale set. The cluster is unavailable while the new public IP address and load balancer are configured for the new scale set.
41+
8. Store DNS settings of the public IP address associated with the old primary node type scale set in a variable and remove that public IP address.
42+
9. Replace the DNS settings of the public IP address associated with the new primary node type scale set with the DNS settings of the deleted public IP address. The cluster is now reachable again.
43+
10. Remove the node state of the nodes from the cluster. If the durability level of the old scale set was silver or gold, this step is done by the system automatically.
44+
11. If you deployed the stateful application in a previous step, verify that the application is functional.
45+
46+
```powershell
47+
# Variables.
48+
$groupname = "sfupgradetestgroup"
49+
$clusterloc="southcentralus"
50+
$subscriptionID="<your subscription ID>"
51+
52+
# sign in to your Azure account and select your subscription
53+
Login-AzureRmAccount -SubscriptionId $subscriptionID
54+
55+
# Create a new resource group for your deployment and give it a name and a location.
56+
New-AzureRmResourceGroup -Name $groupname -Location $clusterloc
57+
58+
# Deploy the two node type cluster.
59+
New-AzureRmResourceGroupDeployment -ResourceGroupName $groupname -TemplateParameterFile "C:\temp\cluster\Deploy-2NodeTypes-2ScaleSets.parameters.json" `
60+
-TemplateFile "C:\temp\cluster\Deploy-2NodeTypes-2ScaleSets.json" -Verbose
61+
62+
# Connect to the cluster and check the cluster health.
63+
$ClusterName= "sfupgradetest.southcentralus.cloudapp.azure.com:19000"
64+
$thumb="F361720F4BD5449F6F083DDE99DC51A86985B25B"
65+
66+
Connect-ServiceFabricCluster -ConnectionEndpoint $ClusterName -KeepAliveIntervalInSec 10 `
67+
-X509Credential `
68+
-ServerCertThumbprint $thumb `
69+
-FindType FindByThumbprint `
70+
-FindValue $thumb `
71+
-StoreLocation CurrentUser `
72+
-StoreName My
73+
74+
Get-ServiceFabricClusterHealth
75+
76+
# Deploy a new scale set into the primary node type. Create a new load balancer and public IP address for the new scale set.
77+
New-AzureRmResourceGroupDeployment -ResourceGroupName $groupname -TemplateParameterFile "C:\temp\cluster\Deploy-2NodeTypes-3ScaleSets.parameters.json" `
78+
-TemplateFile "C:\temp\cluster\Deploy-2NodeTypes-3ScaleSets.json" -Verbose
79+
80+
# Check the cluster health again. All 15 nodes should be healthy.
81+
Get-ServiceFabricClusterHealth
82+
83+
# Disable the nodes in the original scale set.
84+
$nodeNames = @("_NTvm1_0","_NTvm1_1","_NTvm1_2","_NTvm1_3","_NTvm1_4")
85+
86+
Write-Host "Disabling nodes..."
87+
foreach($name in $nodeNames){
88+
Disable-ServiceFabricNode -NodeName $name -Intent RemoveNode -Force
89+
}
90+
91+
Write-Host "Checking node status..."
92+
foreach($name in $nodeNames){
93+
94+
$state = Get-ServiceFabricNode -NodeName $name
95+
96+
$loopTimeout = 50
97+
98+
do{
99+
Start-Sleep 5
100+
$loopTimeout -= 1
101+
$state = Get-ServiceFabricNode -NodeName $name
102+
Write-Host "$name state: " $state.NodeDeactivationInfo.Status
103+
}
104+
105+
while (($state.NodeDeactivationInfo.Status -ne "Completed") -and ($loopTimeout -ne 0))
106+
107+
108+
if ($state.NodeStatus -ne [System.Fabric.Query.NodeStatus]::Disabled)
109+
{
110+
Write-Error "$name node deactivation failed with state" $state.NodeStatus
111+
exit
112+
}
113+
}
114+
115+
# Remove the scale set
116+
$scaleSetName="NTvm1"
117+
Remove-AzureRmVmss -ResourceGroupName $groupname -VMScaleSetName $scaleSetName -Force
118+
Write-Host "Removed scale set $scaleSetName"
119+
120+
$lbname="LB-sfupgradetest-NTvm1"
121+
$oldPublicIpName="PublicIP-LB-FE-0"
122+
$newPublicIpName="PublicIP-LB-FE-2"
123+
124+
# Store DNS settings of public IP address related to old Primary NodeType into variable
125+
$oldprimaryPublicIP = Get-AzureRmPublicIpAddress -Name $oldPublicIpName -ResourceGroupName $groupname
126+
127+
$primaryDNSName = $oldprimaryPublicIP.DnsSettings.DomainNameLabel
128+
129+
$primaryDNSFqdn = $oldprimaryPublicIP.DnsSettings.Fqdn
130+
131+
# Remove Load Balancer related to old Primary NodeType. This will cause a brief period of downtime for the cluster
132+
Remove-AzureRmLoadBalancer -Name $lbname -ResourceGroupName $groupname -Force
133+
134+
# Remove the old public IP
135+
Remove-AzureRmPublicIpAddress -Name $oldPublicIpName -ResourceGroupName $groupname -Force
136+
137+
# Replace DNS settings of Public IP address related to new Primary Node Type with DNS settings of Public IP address related to old Primary Node Type
138+
$PublicIP = Get-AzureRmPublicIpAddress -Name $newPublicIpName -ResourceGroupName $groupname
139+
$PublicIP.DnsSettings.DomainNameLabel = $primaryDNSName
140+
$PublicIP.DnsSettings.Fqdn = $primaryDNSFqdn
141+
Set-AzureRmPublicIpAddress -PublicIpAddress $PublicIP
142+
143+
# Check the cluster health
144+
Get-ServiceFabricClusterHealth
145+
146+
# Remove node state for the deleted nodes.
147+
foreach($name in $nodeNames){
148+
# Remove the node from the cluster
149+
Remove-ServiceFabricNodeState -NodeName $name -TimeoutSec 300 -Force
150+
Write-Host "Removed node state for node $name"
151+
}
152+
```
153+
154+
## Next steps
155+
* Learn about [application scalability](service-fabric-concepts-scalability.md).
156+
* [Scale an Azure cluster in or out](service-fabric-tutorial-scale-cluster.md).
157+
* [Scale an Azure cluster programmatically](service-fabric-cluster-programmatic-scaling.md) using the fluent Azure compute SDK.
158+
* [Scale a standaone cluster in or out](service-fabric-cluster-windows-server-add-remove-nodes.md).
159+

articles/service-fabric/toc.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -525,10 +525,12 @@
525525
href: service-fabric-create-cluster-using-cert-cn.md
526526
- name: Scale
527527
items:
528-
- name: Manually
528+
- name: Manually scale out
529529
href: service-fabric-cluster-scale-up-down.md
530-
- name: Programmatically
530+
- name: Programmatically scale out
531531
href: service-fabric-cluster-programmatic-scaling.md
532+
- name: Scale up the primary node type VMs
533+
href: service-fabric-cluster-upgrade-primary-nodetype-vm.md
532534
- name: Upgrade
533535
href: service-fabric-cluster-upgrade.md
534536
- name: Set access control

0 commit comments

Comments
 (0)