Merge pull request #92312 from v-arpraj/patch-1

Jak-MS · web-flow · commit 8a40081c686e · 2022-05-24T11:18:48.000-05:00
Updating service fabric cluster settings for 9.0RTO
diff --git a/articles/service-fabric/service-fabric-cluster-fabric-settings.md b/articles/service-fabric/service-fabric-cluster-fabric-settings.md
@@ -60,8 +60,18 @@ The following is a list of Fabric settings that you can customize, organized by
 | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
 | --- | --- | --- | --- |
 |DeployedState |wstring, default is L"Disabled" |Static |2-stage removal of CSS. |
+|EnableSecretMonitoring|bool, default is FALSE |Static |Must be enabled to use Managed KeyVaultReferences. Default may become true in the future. For more information, see [KeyVaultReference support for Azure-deployed Service Fabric Applications](https://docs.microsoft.com/azure/service-fabric/service-fabric-keyvault-references)|
+|SecretMonitoringInterval|TimeSpan, default is Common::TimeSpan::FromMinutes(15) |Static |The rate at which Service Fabric will poll Key Vault for changes when using Managed KeyVaultReferences. This rate is a best effort, and changes in Key Vault may be reflected in the cluster earlier or later than the interval. For more information, see [KeyVaultReference support for Azure-deployed Service Fabric Applications](https://docs.microsoft.com/azure/service-fabric/service-fabric-keyvault-references) |
+
 |UpdateEncryptionCertificateTimeout |TimeSpan, default is Common::TimeSpan::MaxValue |Static |Specify timespan in seconds. The default has changed to TimeSpan::MaxValue; but overrides are still respected. May be deprecated in the future. |
 
+## CentralSecretService/Replication
+
+| **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
+| --- | --- | --- | --- |
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
+|ReplicationBatchSize|uint, default is 1|Static|Specifies the number of operations to be sent between primary and secondary replicas. If zero the primary sends one record per operation to the secondary. Otherwise the primary replica aggregates log records until the config value is reached.  This will reduce network traffic.|
+
 ## ClusterManager
 
 | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
@@ -93,6 +103,13 @@ The following is a list of Fabric settings that you can customize, organized by
 |UpgradeStatusPollInterval |Time in seconds, default is 60 |Dynamic|The frequency of polling for application upgrade status. This value determines the rate of update for any GetApplicationUpgradeProgress call |
 |CompleteClientRequest | Bool, default is false |Dynamic| Complete client request when accepted by CM. |
 
+## ClusterManager/Replication
+
+| **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
+| --- | --- | --- | --- |
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
+|ReplicationBatchSize|uint, default is 1|Static|Specifies the number of operations to be sent between primary and secondary replicas. If zero the primary sends one record per operation to the secondary. Otherwise the primary replica aggregates log records until the config value is reached.  This will reduce network traffic.|
+
 ## Common
 
 | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
@@ -147,6 +164,10 @@ The following is a list of Fabric settings that you can customize, organized by
 |IsEnabled|bool, default is FALSE|Static|Enables/Disables DnsService. DnsService is disabled by default and this config needs to be set to enable it. |
 |PartitionPrefix|string, default is "--"|Static|Controls the partition prefix string value in DNS queries for partitioned services. The value : <ul><li>Should be RFC-compliant as it will be part of a DNS query.</li><li>Should not contain a dot, '.', as dot interferes with DNS suffix behavior.</li><li>Should not be longer than 5 characters.</li><li>Cannot be an empty string.</li><li>If the PartitionPrefix setting is overridden, then PartitionSuffix must be overridden, and vice-versa.</li></ul>For more information, see [Service Fabric DNS Service.](service-fabric-dnsservice.md).|
 |PartitionSuffix|string, default is ""|Static|Controls the partition suffix string value in DNS queries for partitioned services.The value : <ul><li>Should be RFC-compliant as it will be part of a DNS query.</li><li>Should not contain a dot, '.', as dot interferes with DNS suffix behavior.</li><li>Should not be longer than 5 characters.</li><li>If the PartitionPrefix setting is overridden, then PartitionSuffix must be overridden, and vice-versa.</li></ul>For more information, see [Service Fabric DNS Service.](service-fabric-dnsservice.md). |
+|RecursiveQueryParallelMaxAttempts|Int, default is 0|Static|The number of times parallel queries will be attempted. Parallel queries are executed after the max attempts for serial queries have been exhausted.|
+|RecursiveQueryParallelTimeout|TimeSpan, default is Common::TimeSpan::FromSeconds(5)|Static|The timeout value in seconds for each attempted parallel query.|
+|RecursiveQuerySerialMaxAttempts|Int, default is 2|Static|The number of serial queries that will be attempted, at most. If this number is higher than the amount of forwarding DNS servers, querying will stop once all the servers have been attempted exactly once.|
+|RecursiveQuerySerialTimeout|TimeSpan, default is Common::TimeSpan::FromSeconds(5)|Static|The timeout value in seconds for each attempted serial query.|
 |TransientErrorMaxRetryCount|Int, default is 3|Static|Controls the number of times SF DNS will retry when a transient error occurs while calling SF APIs (e.g. when retrieving names and endpoints).|
 |TransientErrorRetryIntervalInMillis|Int, default is 0|Static|Sets the delay in milliseconds between retries for when SF DNS calls SF APIs.|
 
@@ -213,6 +234,13 @@ The following is a list of Fabric settings that you can customize, organized by
 |UserRoleClientX509FindValueSecondary |string, default is "" |Dynamic|Search filter value used to locate certificate for default user role FabricClient. |
 |UserRoleClientX509StoreName |string, default is "My" |Dynamic|Name of the X.509 certificate store that contains certificate for default user role FabricClient. |
 
+## Failover/Replication
+
+| **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
+| --- | --- | --- | --- |
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
+|ReplicationBatchSize|uint, default is 1|Static|Specifies the number of operations to be sent between primary and secondary replicas. If zero the primary sends one record per operation to the secondary. Otherwise the primary replica aggregates log records until the config value is reached.  This will reduce network traffic.|
+
 ## FailoverManager
 
 | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
@@ -225,7 +253,9 @@ The following is a list of Fabric settings that you can customize, organized by
 |ExpectedNodeDeactivationDuration|TimeSpan, default is Common::TimeSpan::FromSeconds(60.0 \* 30)|Dynamic|Specify timespan in seconds. This is the expected duration for a node to complete deactivation in. |
 |ExpectedNodeFabricUpgradeDuration|TimeSpan, default is Common::TimeSpan::FromSeconds(60.0 \* 30)|Dynamic|Specify timespan in seconds. This is the expected duration for a node to be upgraded during Windows Fabric upgrade. |
 |ExpectedReplicaUpgradeDuration|TimeSpan, default is Common::TimeSpan::FromSeconds(60.0 \* 30)|Dynamic|Specify timespan in seconds. This is the expected duration for all the replicas to be upgraded on a node during application upgrade. |
+|IgnoreReplicaRestartWaitDurationWhenBelowMinReplicaSetSize|bool, default is FALSE|Dynamic|If IgnoreReplicaRestartWaitDurationWhenBelowMinReplicaSetSize is set to:<br>- false : Windows Fabric will wait for fixed time specified in ReplicaRestartWaitDuration for a replica to come back up.<br>- true  : Windows Fabric will wait for fixed time specified in ReplicaRestartWaitDuration for a replica to come back up if partition is above or at Min Replica Set Size. If partition is below Min Replica Set Size new replica will be created right away.|
 |IsSingletonReplicaMoveAllowedDuringUpgrade|bool, default is TRUE|Dynamic|If set to true; replicas with a target replica set size of 1 will be permitted to move during upgrade. |
+|MaxInstanceCloseDelayDurationInSeconds|uint, default is 1800|Dynamic|Maximum value of InstanceCloseDelay that can be configured to be used for FabricUpgrade/ApplicationUpgrade/NodeDeactivations |
 |MinReplicaSetSize|int, default is 3|Not Allowed|This is the minimum replica set size for the FM. If the number of active FM replicas drops below this value; the FM will reject changes to the cluster until at least the min number of replicas is recovered |
 |PlacementConstraints|string, default is ""|Not Allowed|Any placement constraints for the failover manager replicas |
 |PlacementTimeLimit|TimeSpan, default is Common::TimeSpan::FromSeconds(600)|Dynamic|Specify timespan in seconds. The time limit for reaching target replica count; after which a warning health report will be initiated |
@@ -306,6 +336,13 @@ The following is a list of Fabric settings that you can customize, organized by
 |SecondaryFileCopyRetryDelayMilliseconds|uint, default is 500|Dynamic|The file copy retry delay (in milliseconds).|
 |UseChunkContentInTransportMessage|bool, default is TRUE|Dynamic|The flag for using the new version of the upload protocol introduced in v6.4. This protocol version uses service fabric transport to upload files to image store which provides better performance than SMB protocol used in previous versions. |
 
+## FileStoreService/Replication
+
+| **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
+| --- | --- | --- | --- |
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
+|ReplicationBatchSize|uint, default is 1|Static|Specifies the number of operations to be sent between primary and secondary replicas. If zero the primary sends one record per operation to the secondary. Otherwise the primary replica aggregates log records until the config value is reached.  This will reduce network traffic.|
+
 ## HealthManager
 
 | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
@@ -460,6 +497,13 @@ The following is a list of Fabric settings that you can customize, organized by
 | --- | --- | --- | --- |
 |PropertyGroup|KeyDoubleValueMap, default is None|Dynamic|Determines the part of the load that sticks with replica when swapped It takes value between 0 (load doesn't stick with replica) and 1 (load sticks with replica - default) |
 
+## Naming/Replication
+
+| **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
+| --- | --- | --- | --- |
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
+|ReplicationBatchSize|uint, default is 1|Static|Specifies the number of operations to be sent between primary and secondary replicas. If zero the primary sends one record per operation to the secondary. Otherwise the primary replica aggregates log records until the config value is reached.  This will reduce network traffic.|
+
 ## NamingService
 
 | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
@@ -608,7 +652,16 @@ The following is a list of Fabric settings that you can customize, organized by
 |ServiceApiHealthDuration | Time in seconds, default is 30 minutes |Dynamic| Specify timespan in seconds. ServiceApiHealthDuration defines how long do we wait for a service API to run before we report it unhealthy. |
 |ServiceReconfigurationApiHealthDuration | Time in seconds, default is 30 |Dynamic| Specify timespan in seconds. ServiceReconfigurationApiHealthDuration defines how long do we wait for a service API to run before we report unhealthy. This applies to API calls that impact availability.|
 
+## RepairManager/Replication
+| **Parameter** | **Allowed Values** | **Upgrade Policy**| **Guidance or Short Description** |
+| --- | --- | --- | --- |
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
+|ReplicationBatchSize|uint, default is 1|Static|Specifies the number of operations to be sent between primary and secondary replicas. If zero the primary sends one record per operation to the secondary. Otherwise the primary replica aggregates log records until the config value is reached.  This will reduce network traffic.|
+
 ## Replication
+<i> **Warning Note** : Changing Replication/TranscationalReplicator settings at cluster level changes settings for all stateful services include system services. This is generally not recommended. See this document [Configure Azure Service Fabric Reliable Services - Azure Service Fabric | Microsoft Docs](https://docs.microsoft.com/azure/service-fabric/service-fabric-reliable-services-configuration) to configure services at app level.</i>
+
+
 | **Parameter** | **Allowed Values** | **Upgrade Policy**| **Guidance or Short Description** |
 | --- | --- | --- | --- |
 |BatchAcknowledgementInterval|TimeSpan, default is Common::TimeSpan::FromMilliseconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before sending back an acknowledgement. Other operations received during this time period will have their acknowledgements sent back in a single message-> reducing network traffic but potentially reducing the throughput of the replicator.|
@@ -621,6 +674,8 @@ The following is a list of Fabric settings that you can customize, organized by
 |QueueHealthMonitoringInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(30)|Static|Specify timespan in seconds. This value determines the time period used by the Replicator to monitor any warning/error health events in the replication operation queues. A value of '0' disables health monitoring |
 |QueueHealthWarningAtUsagePercent|uint, default is 80|Static|This value determines the replication queue usage(in percentage) after which we report warning about high queue usage. We do so after a grace interval of QueueHealthMonitoringInterval. If the queue usage falls below this percentage in the grace interval|
 |ReplicatorAddress|string, default is "localhost:0"|Static|The endpoint in form of a string -'IP:Port' which is used by the Windows Fabric Replicator to establish connections with other replicas in order to send/receive operations.|
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(15)|Static|Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
+|ReplicationBatchSize|uint, default is 1|Static|Specifies the number of operations to be sent between primary and secondary replicas. If zero the primary sends one record per operation to the secondary. Otherwise the primary replica aggregates log records until the config value is reached.  This will reduce network traffic.|
 |ReplicatorListenAddress|string, default is "localhost:0"|Static|The endpoint in form of a string -'IP:Port' which is used by the Windows Fabric Replicator to receive operations from other replicas.|
 |ReplicatorPublishAddress|string, default is "localhost:0"|Static|The endpoint in form of a string -'IP:Port' which is used by the Windows Fabric Replicator to send operations to other replicas.|
 |RetryInterval|TimeSpan, default is Common::TimeSpan::FromSeconds(5)|Static|Specify timespan in seconds. When an operation is lost or rejected this timer determines how often the replicator will retry sending the operation.|
@@ -877,6 +932,7 @@ The following is a list of Fabric settings that you can customize, organized by
 |Level |Int, default is 4 | Dynamic |Trace etw level can take values 1, 2, 3, 4. To be supported you must keep the trace level at 4 |
 
 ## TransactionalReplicator
+<i> **Warning Note** : Changing Replication/TranscationalReplicator settings at cluster level changes settings for all stateful services include system services. This is generally not recommended. See this document [Configure Azure Service Fabric Reliable Services - Azure Service Fabric | Microsoft Docs](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-configuration) to configure services at app level.</i>
 
 | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** |
 | --- | --- | --- | --- |
@@ -888,6 +944,7 @@ The following is a list of Fabric settings that you can customize, organized by
 |MaxSecondaryReplicationQueueMemorySize |Uint, default is 0 | Static |This is the maximum value of the secondary replication queue in bytes. |
 |MaxSecondaryReplicationQueueSize |Uint, default is 16384 | Static |This is the maximum number of operations that could exist in the secondary replication queue. Note that it must be a power of 2. |
 |ReplicatorAddress |string, default is "localhost:0" | Static | The endpoint in form of a string -'IP:Port' which is used by the Windows Fabric Replicator to establish connections with other replicas in order to send/receive operations. |
+|ReplicationBatchSendInterval|TimeSpan, default is Common::TimeSpan::FromMilliseconds(15) | Static | Specify timespan in seconds. Determines the amount of time that the replicator waits after receiving an operation before force sending a batch.|
 |ShouldAbortCopyForTruncation |bool, default is FALSE | Static | Allow pending log truncation to go through during copy. With this enabled the copy stage of builds can be cancelled if the log is full and they are block truncation. |
 
 ## Transport