Merge pull request #77922 from flyPacific/SeedNodeStatusHealthReportUpdate

PRMerger9 · web-flow · commit 18b2b7fa332c · 2019-06-10T07:17:20.000+08:00
Add Seed Node Status health report information
diff --git a/articles/service-fabric/service-fabric-understand-and-troubleshoot-with-system-health-reports.md b/articles/service-fabric/service-fabric-understand-and-troubleshoot-with-system-health-reports.md
@@ -68,6 +68,29 @@ When one of the previous conditions happens, **System.FM** or **System.FMM** fla
 * **Property**: Rebuild.
 * **Next steps**: Investigate the network connection between the nodes, as well as the state of any specific nodes that are listed on the description of the health report.
 
+### Seed Node Status
+**System.FM** reports a cluster level warning if some seed nodes are unhealthy. Seed nodes are the nodes which maintain the availability of the underlying cluster. These nodes help to ensure the cluster remains up by establishing leases with other nodes and serving as tiebreakers during certain kinds of network failures. If a majority of the seed nodes are down in the cluster and they are not brought back, the cluster automatically shuts down. 
+
+A seed node is unhealthy if its node status is Down, Removed or Unknown.
+The warning report for seed node status will list all the unhealthy seed nodes with detailed information.
+
+* **SourceID**: System.FM
+* **Property**: SeedNodeStatus
+* **Next steps**: If this warning shows in the cluster, follow below instructions to fix it:
+For cluster running Service Fabric version 6.5 or higher:
+For Service Fabric cluster on Azure, after the seed node goes down, Service Fabric will try to change it to a non-seed node automatically. To make this happen, make sure the number of non-seed nodes in the primary node type is greater or equal to the number of Down seed nodes. If necessary, add more nodes to the primary node type to achieve this.
+Depending on the cluster status, it may take some time to fix the issue. Once this is done, the warning report is automatically cleared.
+
+For Service Fabric standalone cluster, to clear the warning report, all the seed nodes need to become healthy. Depending on why seed nodes are unhealthy, different actions need to be taken: if the seed node is Down, users need to bring that seed node up; if the seed node is Removed or Unknown, this seed node [needs to be removed from the cluster](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-windows-server-add-remove-nodes).
+The warning report is automatically cleared when all seed nodes become healthy.
+
+For cluster running Service Fabric version older than 6.5:
+In this case, the warning report needs to be cleared manually. **Users should make sure all the seed nodes become healthy before clearing the report**: if the seed node is Down, users need to bring that seed node up;if the seed node is Removed or Unknown, that seed node needs to be removed from the cluster.
+After all the seed nodes become healthy, use following command from Powershell to [clear the warning report](https://docs.microsoft.com/en-us/powershell/module/servicefabric/send-servicefabricclusterhealthreport):
+
+```powershell
+PS C:\> Send-ServiceFabricClusterHealthReport -SourceId "System.FM" -HealthProperty "SeedNodeStatus" -HealthState OK
+
 ## Node system health reports
 System.FM, which represents the Failover Manager service, is the authority that manages information about cluster nodes. Each node should have one report from System.FM showing its state. The node entities are removed when the node state is removed. For more information, see [RemoveNodeStateAsync](https://docs.microsoft.com/dotnet/api/system.fabric.fabricclient.clustermanagementclient.removenodestateasync).