Skip to content

Commit 33ae4a0

Browse files
Update service-fabric-production-readiness-checklist.md
Fix doc issues.
1 parent b1e18f7 commit 33ae4a0

File tree

1 file changed

+30
-36
lines changed

1 file changed

+30
-36
lines changed

articles/service-fabric/service-fabric-production-readiness-checklist.md

Lines changed: 30 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.devlang: dotNet
1313
ms.topic: conceptual
1414
ms.tgt_pltfrm: NA
1515
ms.workload: NA
16-
ms.date: 7/10/2018
16+
ms.date: 6/05/2019
1717
ms.author: aljo
1818
---
1919

@@ -23,48 +23,42 @@ Is your application and cluster ready to take production traffic? Running and te
2323

2424

2525
## Pre-requisites for production
26-
1. [Azure Service Fabric Security best practices](https://docs.microsoft.com/azure/security/azure-service-fabric-security-best-practices) are:
27-
1. Use X.509 certificates
28-
1. Configure security policies
29-
1. Configure SSL for Azure Service Fabric
30-
1. Use network isolation and security with Azure Service Fabric
31-
1. Set up Azure Key Vault for security
32-
1. Microsoft.Network/loadBalancersAssign users to roles
33-
1. Implement the Reliable Actors security configuration if using the Actors programming model
34-
1. For clusters with more than 20 cores or 10 nodes, create a dedicated primary node type for system services. Add [placement constraints](service-fabric-cluster-resource-manager-advanced-placement-rules-placement-policies.md) to reserve the primary node type for system services.
35-
1. Use a D2v2 or higher SKU for the primary node type. It is recommended to pick a SKU with at least 50 GB hard disk capacity.
36-
1. Production clusters must be [secure](service-fabric-cluster-security.md). For an example of setting up a secure cluster, see this [cluster template](https://github.com/Azure-Samples/service-fabric-cluster-templates/tree/master/7-VM-Windows-3-NodeTypes-Secure-NSG). Use common names for certificates and avoid using self signed certs.
37-
1. Add [resource constraints on containers and services](service-fabric-resource-governance.md), so that they don't consume more than 75% of node resources.
38-
1. Understand and set the [durability level](service-fabric-cluster-capacity.md#the-durability-characteristics-of-the-cluster). Silver or higher durability level is recommended for node types running stateful workloads. The primary node type should have a durability level set to Silver or higher.
39-
1. Understand and pick the [reliability level](service-fabric-cluster-capacity.md#the-reliability-characteristics-of-the-cluster) of the node type. Silver or higher reliability is recommended.
40-
1. Load and scale test your workloads to identify [capacity requirements](service-fabric-cluster-capacity.md) for your cluster.
41-
1. Your services and applications are monitored and application logs are being generated and stored, with alerting. For example, see [Add logging to your Service Fabric application](service-fabric-how-to-diagnostics-log.md) and [Monitor containers with Azure Monitor logs](service-fabric-diagnostics-oms-containers.md).
42-
1. The cluster is monitored with alerting (for example, with [Azure Monitor logs](service-fabric-diagnostics-event-analysis-oms.md)).
43-
1. The underlying virtual machine scale set infrastructure is monitored with alerting (for example, with [Azure Monitor logs](service-fabric-diagnostics-oms-agent.md).
44-
1. The cluster has [primary and secondary certificates](service-fabric-cluster-security-update-certs-azure.md) always (so you don't get locked out).
45-
1. Maintain separate clusters for development, staging, and production.
46-
1. [Application upgrades](service-fabric-application-upgrade.md) and [cluster upgrades](service-fabric-tutorial-upgrade-cluster.md) are tested in development and staging clusters first.
47-
1. Turn off automatic upgrades in production clusters, and turn it on for development and staging clusters (rollback as needed).
48-
1. Establish a Recovery Point Objective (RPO) for your service, and set up a [disaster recovery process](service-fabric-disaster-recovery.md) and test it out.
49-
1. Plan for [scaling](service-fabric-cluster-scaling.md) your cluster manually or programmatically.
50-
1. Plan for [patching](service-fabric-patch-orchestration-application.md) your cluster nodes.
51-
1. Establish a CI/CD pipeline so that your latest changes are being continually tested. For example, using [Azure DevOps](service-fabric-tutorial-deploy-app-with-cicd-vsts.md) or [Jenkins](service-fabric-cicd-your-linux-applications-with-jenkins.md)
52-
1. Test your development & staging clusters under load with the [Fault Analysis Service](service-fabric-testability-overview.md) and induce controlled [chaos](service-fabric-controlled-chaos.md).
53-
1. Plan for [scaling](service-fabric-concepts-scalability.md) your applications.
26+
1. Azure Service Fabric best practices: [Application Design](./service-fabric-best-practices-applications.md), [Security](./service-fabric-best-practices-security.md), [Networking](./service-fabric-best-practices-networking.m), [Capacity planning and scaling](./service-fabric-best-practices-capacity-scaling.md), [Infrastructure as Code](./service-fabric-best-practices-infrastructure-as-code.md), and [Monitoring and Diagnostics](./service-fabric-best-practices-monitoring.md).
27+
1. Implement the Reliable Actors security configuration if using the Actors programming model
28+
1. For clusters with more than 20 cores or 10 nodes, create a dedicated primary node type for system services. Add [placement constraints](service-fabric-cluster-resource-manager-advanced-placement-rules-placement-policies.md) to reserve the primary node type for system services.
29+
1. Use a D2v2 or higher SKU for the primary node type. It is recommended to pick a SKU with at least 50 GB hard disk capacity.
30+
1. Production clusters must be [secure](service-fabric-cluster-security.md). For an example of setting up a secure cluster, see this [cluster template](https://github.com/Azure-Samples/service-fabric-cluster-templates/tree/master/7-VM-Windows-3-NodeTypes-Secure-NSG). Use common names for certificates and avoid using self signed certs.
31+
1. Add [resource constraints on containers and services](service-fabric-resource-governance.md), so that they don't consume more than 75% of node resources.
32+
1. Understand and set the [durability level](service-fabric-cluster-capacity.md#the-durability-characteristics-of-the-cluster). Silver or higher durability level is recommended for node types running stateful workloads. The primary node type should have a durability level set to Silver or higher.
33+
1. Understand and pick the [reliability level](service-fabric-cluster-capacity.md#the-reliability-characteristics-of-the-cluster) of the node type. Silver or higher reliability is recommended.
34+
1. Load and scale test your workloads to identify [capacity requirements](service-fabric-cluster-capacity.md) for your cluster.
35+
1. Your services and applications are monitored and application logs are being generated and stored, with alerting. For example, see [Add logging to your Service Fabric application](service-fabric-how-to-diagnostics-log.md) and [Monitor containers with Azure Monitor logs](service-fabric-diagnostics-oms-containers.md).
36+
1. The cluster is monitored with alerting (for example, with [Azure Monitor logs](service-fabric-diagnostics-event-analysis-oms.md)).
37+
1. The underlying virtual machine scale set infrastructure is monitored with alerting (for example, with [Azure Monitor logs](service-fabric-diagnostics-oms-agent.md).
38+
1. The cluster has [primary and secondary certificates](service-fabric-cluster-security-update-certs-azure.md) always (so you don't get locked out).
39+
1. Maintain separate clusters for development, staging, and production.
40+
1. [Application upgrades](service-fabric-application-upgrade.md) and [cluster upgrades](service-fabric-tutorial-upgrade-cluster.md) are tested in development and staging clusters first.
41+
1. Turn off automatic upgrades in production clusters, and turn it on for development and staging clusters (rollback as needed).
42+
1. Establish a Recovery Point Objective (RPO) for your service, and set up a [disaster recovery process](service-fabric-disaster-recovery.md) and test it out.
43+
1. Plan for [scaling](service-fabric-cluster-scaling.md) your cluster manually or programmatically.
44+
1. Plan for [patching](service-fabric-patch-orchestration-application.md) your cluster nodes.
45+
1. Establish a CI/CD pipeline so that your latest changes are being continually tested. For example, using [Azure DevOps](service-fabric-tutorial-deploy-app-with-cicd-vsts.md) or [Jenkins](service-fabric-cicd-your-linux-applications-with-jenkins.md)
46+
1. Test your development & staging clusters under load with the [Fault Analysis Service](service-fabric-testability-overview.md) and induce controlled [chaos](service-fabric-controlled-chaos.md).
47+
1. Plan for [scaling](service-fabric-concepts-scalability.md) your applications.
5448

5549

5650
If you're using the Service Fabric Reliable Services or Reliable Actors programming model, the following items need to be checked off:
57-
1. Upgrade applications during local development to check that your service code is honoring the cancellation token in the `RunAsync` method and closing custom communication listeners.
58-
1. Avoid [common pitfalls](service-fabric-work-with-reliable-collections.md) when using Reliable Collections.
59-
1. Monitor the .NET CLR memory performance counters when running load tests and check for high rates of Garbage Collection or runaway heap growth.
60-
1. Maintain offline backup of [Reliable Services and Reliable Actors](service-fabric-reliable-services-backup-restore.md) and test the restoration process.
61-
1. Your Primary NodeType Virtual Machine instance count should ideally be equal to the minimum for your Clusters Reliability tier; conditions when appropriate to exceed the Tier minimum includes: temporarily when vertically scaling your Primary NodeTypes Virtual Machine Scale Set SKU.
51+
1. Upgrade applications during local development to check that your service code is honoring the cancellation token in the `RunAsync` method and closing custom communication listeners.
52+
1. Avoid [common pitfalls](service-fabric-work-with-reliable-collections.md) when using Reliable Collections.
53+
1. Monitor the .NET CLR memory performance counters when running load tests and check for high rates of Garbage Collection or runaway heap growth.
54+
1. Maintain offline backup of [Reliable Services and Reliable Actors](service-fabric-reliable-services-backup-restore.md) and test the restoration process.
55+
1. Your Primary NodeType Virtual Machine instance count should ideally be equal to the minimum for your Clusters Reliability tier; conditions when appropriate to exceed the Tier minimum includes: temporarily when vertically scaling your Primary NodeTypes Virtual Machine Scale Set SKU.
6256

6357
## Optional best practices
6458

6559
While the above lists are pre-requisites to go into production, the following items should also be considered:
66-
1. Plug into the [Service Fabric health model](service-fabric-health-introduction.md) for extending the built-in health evaluation and reporting.
67-
1. Deploy a custom watchdog that is monitoring your application and reports [load](service-fabric-cluster-resource-manager-metrics.md) for [resource balancing](service-fabric-cluster-resource-manager-balancing.md).
60+
1. Plug into the [Service Fabric health model](service-fabric-health-introduction.md) for extending the built-in health evaluation and reporting.
61+
1. Deploy a custom watchdog that is monitoring your application and reports [load](service-fabric-cluster-resource-manager-metrics.md) for [resource balancing](service-fabric-cluster-resource-manager-balancing.md).
6862

6963

7064
## Next steps

0 commit comments

Comments
 (0)