diff --git a/.openpublishing.redirection.orleans.json b/.openpublishing.redirection.orleans.json
index 976f95f94b66e..7bd96a87a188f 100644
--- a/.openpublishing.redirection.orleans.json
+++ b/.openpublishing.redirection.orleans.json
@@ -39,6 +39,11 @@
{
"source_path_from_root": "/docs/orleans/host/configuration-guide/activation-garbage-collection.md",
"redirect_url": "/dotnet/orleans/host/configuration-guide/activation-collection"
+ },
+ {
+ "source_path_from_root": "/docs/orleans/implementation/cluster-configuration.md",
+ "redirect_url": "/dotnet/orleans/implementation/cluster-management"
}
+
]
}
diff --git a/docs/ai/how-to/snippets/semantic-kernel/semantic-kernel.csproj b/docs/ai/how-to/snippets/semantic-kernel/semantic-kernel.csproj
index bbb9b10898c5d..b95fb6cefb1e9 100644
--- a/docs/ai/how-to/snippets/semantic-kernel/semantic-kernel.csproj
+++ b/docs/ai/how-to/snippets/semantic-kernel/semantic-kernel.csproj
@@ -15,7 +15,7 @@
-
+
diff --git a/docs/ai/tutorials/snippets/llm-eval/llm-eval.csproj b/docs/ai/tutorials/snippets/llm-eval/llm-eval.csproj
index a939699dd073f..8c10d18a55899 100644
--- a/docs/ai/tutorials/snippets/llm-eval/llm-eval.csproj
+++ b/docs/ai/tutorials/snippets/llm-eval/llm-eval.csproj
@@ -12,7 +12,7 @@
-
+
diff --git a/docs/architecture/maui/navigation.md b/docs/architecture/maui/navigation.md
index 96a11c5fa696b..36b0c8d2531c6 100644
--- a/docs/architecture/maui/navigation.md
+++ b/docs/architecture/maui/navigation.md
@@ -59,7 +59,7 @@ This interface specifies that an implementing class must provide the following m
The `MauiNavigationService` class, which implements the `INavigationService` interface, is registered as a singleton with the dependency injection container in the `MauiProgram.CreateMauiApp()` method, as demonstrated in the following code example:
```csharp
-mauiAppBuilder.Services.AddSingleton();;
+mauiAppBuilder.Services.AddSingleton();
```
The `INavigationService` interface can then be resolved by adding it to the constructor of our views and view-models, as demonstrated in the following code example:
diff --git a/docs/core/compatibility/core-libraries/5.0/hardware-instrinsics-issupported-checks.md b/docs/core/compatibility/core-libraries/5.0/hardware-instrinsics-issupported-checks.md
index 1b814899cc4e6..c896ff3be623d 100644
--- a/docs/core/compatibility/core-libraries/5.0/hardware-instrinsics-issupported-checks.md
+++ b/docs/core/compatibility/core-libraries/5.0/hardware-instrinsics-issupported-checks.md
@@ -8,7 +8,7 @@ ms.date: 11/01/2020
Checking `.X64.IsSupported`, where `` refers to the classes in the namespace, may now produce a different result to previous versions of .NET.
> [!TIP]
-> *ISA* stands for industry standard architecture.
+> *ISA* stands for Instruction Set Architecture.
## Version introduced
diff --git a/docs/core/compatibility/core-libraries/5.0/sse-comparegreaterthan-intrinsics.md b/docs/core/compatibility/core-libraries/5.0/sse-comparegreaterthan-intrinsics.md
index 8b246e5991973..fecce13265ddd 100644
--- a/docs/core/compatibility/core-libraries/5.0/sse-comparegreaterthan-intrinsics.md
+++ b/docs/core/compatibility/core-libraries/5.0/sse-comparegreaterthan-intrinsics.md
@@ -22,7 +22,7 @@ Previously, `NaN` inputs to the listed
Starting in .NET 5, these methods correctly handle `NaN` inputs and return the same results as the corresponding methods in the class.
-The Streaming SIMD Extensions (SSE) and Streaming SIMD Extensions 2 (SSE2) industry standard architectures (ISAs) don't provide direct hardware support for these comparison methods, so they're implemented in software. Previously, the methods were improperly implemented, and they incorrectly handled `NaN` inputs. For code ported from native, the incorrect behavior may introduce bugs. For a 256-bit code path, the methods can also produce different results to the equivalent methods in the class.
+The Streaming SIMD Extensions (SSE) and Streaming SIMD Extensions 2 (SSE2) Instruction Set Architectures (ISAs) don't provide direct hardware support for these comparison methods, so they're implemented in software. Previously, the methods were improperly implemented, and they incorrectly handled `NaN` inputs. For code ported from native, the incorrect behavior may introduce bugs. For a 256-bit code path, the methods can also produce different results to the equivalent methods in the class.
As an example of how the methods were previously incorrect, you can implement `CompareNotGreaterThan(x,y)` as `CompareLessThanOrEqual(x,y)` for regular integers. However, for `NaN` inputs, that logic computes the wrong result. Instead, using `CompareNotLessThan(y,x)` compares the numbers correctly *and* takes `NaN` inputs into consideration.
diff --git a/docs/core/testing/selective-unit-tests.md b/docs/core/testing/selective-unit-tests.md
index 5b8bcaa7fb592..aa0d6436e3942 100644
--- a/docs/core/testing/selective-unit-tests.md
+++ b/docs/core/testing/selective-unit-tests.md
@@ -56,6 +56,12 @@ For `FullyQualifiedName` values that include a comma for generic type parameters
dotnet test --filter "FullyQualifiedName=MyNamespace.MyTestsClass.MyTestMethod"
```
+For `Name` or `DisplayName`, use the URL encoding for the special characters. For example, to run a test with the name `MyTestMethod` and a string value `"text"`, use the following filter:
+
+```dotnetcli
+dotnet test --filter "Name=MyTestMethod \(%22text%22\)"
+```
+
:::zone pivot="mstest"
## MSTest examples
diff --git a/docs/core/testing/unit-testing-platform-vs-vstest.md b/docs/core/testing/unit-testing-platform-vs-vstest.md
index bd3c3a9df524c..877df46614800 100644
--- a/docs/core/testing/unit-testing-platform-vs-vstest.md
+++ b/docs/core/testing/unit-testing-platform-vs-vstest.md
@@ -47,13 +47,7 @@ VSTest also uses a JSON based communication protocol, but it's not JSON-RPC base
### Disabling the new protocol
-To disable the use of the new protocol in Test Explorer, you can edit the csproj and remove the `TestingPlatformServer` capability.
-
-```xml
-
-
-
-```
+To disable the use of the new protocol in Test Explorer, you can edit your project to add the following property: `true`.
## Executables
diff --git a/docs/core/tools/dotnet-install-script.md b/docs/core/tools/dotnet-install-script.md
index 03488c9e51392..c273169317a0d 100644
--- a/docs/core/tools/dotnet-install-script.md
+++ b/docs/core/tools/dotnet-install-script.md
@@ -1,7 +1,7 @@
---
title: dotnet-install scripts
description: Learn about the dotnet-install scripts to install the .NET SDK and the shared runtime.
-ms.date: 12/26/2024
+ms.date: 01/15/2024
---
# dotnet-install scripts reference
@@ -20,7 +20,7 @@ Windows:
dotnet-install.ps1 [-Architecture ] [-AzureFeed]
[-Channel ] [-DryRun] [-FeedCredential]
[-InstallDir ] [-JSonFile ]
- [-NoCdn] [-NoPath] [-ProxyAddress] [-ProxyBypassList ]
+ [-NoPath] [-ProxyAddress] [-ProxyBypassList ]
[-ProxyUseDefaultCredentials] [-Quality ] [-Runtime ]
[-SkipNonVersionedFiles] [-UncachedFeed] [-KeepZip] [-ZipPath ] [-Verbose]
[-Version ]
@@ -34,7 +34,7 @@ Linux/macOS:
dotnet-install.sh [--architecture ] [--azure-feed]
[--channel ] [--dry-run] [--feed-credential]
[--install-dir ] [--jsonfile ]
- [--no-cdn] [--no-path] [--quality ]
+ [--no-path] [--quality ]
[--runtime ] [--runtime-id ]
[--skip-non-versioned-files] [--uncached-feed] [--keep-zip] [--zip-path ] [--verbose]
[--version ]
@@ -102,7 +102,7 @@ The install scripts do not update the registry on Windows. They just download th
- **`-AzureFeed|--azure-feed`**
- For internal use only. Allows using a different storage to download SDK archives from. This parameter is only used if --no-cdn is false. The default is `https://builds.dotnet.microsoft.com/dotnet`.
+ For internal use only. Allows using a different storage to download SDK archives from. The default is `https://builds.dotnet.microsoft.com/dotnet`.
- **`-Channel|--channel `**
@@ -137,10 +137,6 @@ The install scripts do not update the registry on Windows. They just download th
Specifies a path to a [global.json](global-json.md) file that will be used to determine the SDK version. The *global.json* file must have a value for `sdk:version`.
-- **`-NoCdn|--no-cdn`**
-
- Disables downloading from the [Azure Content Delivery Network (CDN)](/azure/cdn/cdn-overview) and uses the uncached feed directly.
-
- **`-NoPath|--no-path`**
If set, the installation folder isn't exported to the path for the current session. By default, the script modifies the PATH, which makes the .NET CLI available immediately after install.
@@ -205,7 +201,7 @@ The install scripts do not update the registry on Windows. They just download th
- **`-UncachedFeed|--uncached-feed`**
- For internal use only. Allows using a different storage to download SDK archives from. This parameter is only used if --no-cdn is true.
+ For internal use only. Allows using a different storage to download SDK archives from. This parameter overwrites `-AzureFeed|--azure-feed`.
- **`-KeepZip|--keep-zip`**
diff --git a/docs/orleans/host/configuration-guide/local-development-configuration.md b/docs/orleans/host/configuration-guide/local-development-configuration.md
index fcce519d0ac42..ed7474ddbad63 100644
--- a/docs/orleans/host/configuration-guide/local-development-configuration.md
+++ b/docs/orleans/host/configuration-guide/local-development-configuration.md
@@ -26,7 +26,7 @@ using Microsoft.Extensions.Hosting;
await Host.CreateDefaultBuilder(args)
.UseOrleans(siloBuilder =>
{
- siloBuilder.UseLocalhostClustering();;
+ siloBuilder.UseLocalhostClustering();
})
.RunConsoleAsync();
```
diff --git a/docs/orleans/host/configuration-guide/startup-tasks.md b/docs/orleans/host/configuration-guide/startup-tasks.md
index 6c86ffcb4b186..d553ac1cf520b 100644
--- a/docs/orleans/host/configuration-guide/startup-tasks.md
+++ b/docs/orleans/host/configuration-guide/startup-tasks.md
@@ -1,44 +1,158 @@
---
-title: Startup tasks
-description: Learn how to configure and manage startup tasks in .NET Orleans.
-ms.date: 07/03/2024
+title: Background Services and Startup Tasks
+description: Learn how to configure and manage background services and startup tasks in .NET Orleans.
+ms.date: 11/19/2024
---
-# Startup tasks
+# Background Services and Startup Tasks
-In many cases, some task needs to be performed automatically as soon as a silo becomes available. Startup tasks provide this functionality.
+When building Orleans applications, you often need to perform background operations or initialize components when the application starts.
-Some use cases include, but are not limited to:
+Startup tasks can be used to perform initialization work when a silo starts, before or after it begins accepting requests. Common use cases include:
-* Starting background timers to perform periodic housekeeping tasks
-* Pre-loading some cache grains with data downloaded from external backing storage
+* Initializing grain state or preloading data
+* Setting up external service connections
+* Performing database migrations
+* Validating configuration
+* Warming up caches
-Any exceptions that are thrown from a startup task during startup will be reported in the silo log and will stop the silo.
+## Using BackgroundService (Recommended)
-This fail-fast approach is the standard way that Orleans handles silo start-up issues, and is intended to allow any problems with silo configuration and/or bootstrap logic to be easily detected during testing phases rather than being silently ignored and causing unexpected problems later in the silo lifecycle.
+The recommended approach is to use .NET [BackgroundService or `IHostedService`](/aspnet/core/fundamentals/host/hosted-services). See the [Background tasks with hosted services in ASP.NET Core](/aspnet/core/fundamentals/host/hosted-services) documentation for more information.
-## Configure startup tasks
+Here's an example that pings a grain every 5 seconds:
-Startup tasks can be configured using the either by registering a delegate to be invoked during startup or by registering an implementation of .
+```csharp
+public class GrainPingService : BackgroundService
+{
+ private readonly IGrainFactory _grainFactory;
+ private readonly ILogger _logger;
-### Register a delegate
+ public GrainPingService(
+ IGrainFactory grainFactory,
+ ILogger logger)
+ {
+ _grainFactory = grainFactory;
+ _logger = logger;
+ }
+
+ protected override async Task ExecuteAsync(CancellationToken stoppingToken)
+ {
+ try
+ {
+ while (!stoppingToken.IsCancellationRequested)
+ {
+ try
+ {
+ _logger.LogInformation("Pinging grain...");
+ var grain = _grainFactory.GetGrain("ping-target");
+ await grain.Ping();
+ }
+ catch (Exception ex) when (ex is not OperationCanceledException)
+ {
+ // Log the error but continue running
+ _logger.LogError(ex, "Failed to ping grain. Will retry in 5 seconds.");
+ }
+
+ await Task.Delay(TimeSpan.FromSeconds(5), stoppingToken);
+ }
+ }
+ catch (OperationCanceledException) when (stoppingToken.IsCancellationRequested)
+ {
+ // Ignore cancellation during shutdown.
+ }
+ finally
+ {
+ _logger.LogInformation("Grain ping service is shutting down.");
+ }
+ }
+}
+```
+
+Registration order is significant, since services added to the host builder are started one-by-one, in the order they are registered. You can register the background service as follows:
```csharp
-siloHostBuilder.AddStartupTask(
- async (IServiceProvider services, CancellationToken cancellation) =>
- {
- // Use the service provider to get the grain factory.
- var grainFactory = services.GetRequiredService();
-
- // Get a reference to a grain and call a method on it.
- var grain = grainFactory.GetGrain(0);
- await grain.Initialize();
+var builder = WebApplication.CreateBuilder(args);
+
+// Configure Orleans first
+builder.UseOrleans(siloBuilder =>
+{
+ // Orleans configuration...
});
+
+// Register the background service after calling 'UseOrleans' to make it start once Orleans has started.
+builder.Services.AddHostedService();
+
+var app = builder.Build();
+```
+
+The background service will start automatically when the application starts and will gracefully shut down when the application stops.
+
+## Using IHostedService
+
+For simpler scenarios where you don't need continuous background operation, you can implement `IHostedService` directly:
+
+```csharp
+public class GrainInitializerService : IHostedService
+{
+ private readonly IGrainFactory _grainFactory;
+ private readonly ILogger _logger;
+
+ public GrainInitializerService(
+ IGrainFactory grainFactory,
+ ILogger logger)
+ {
+ _grainFactory = grainFactory;
+ _logger = logger;
+ }
+
+ public async Task StartAsync(CancellationToken cancellationToken)
+ {
+ _logger.LogInformation("Initializing grains...");
+ var grain = _grainFactory.GetGrain("initializer");
+ await grain.Initialize();
+ }
+
+ public Task StopAsync(CancellationToken cancellationToken)
+ {
+ return Task.CompletedTask;
+ }
+}
+```
+
+Register it the same way:
+
+```csharp
+builder.Services.AddHostedService();
+```
+
+## Orleans' Startup Tasks
+
+> [!NOTE]
+> While startup tasks are still supported, we recommend using `BackgroundService` or `IHostedService` instead as they are the common .NET hosting mechanism for running background tasks.
+
+> [!WARNING]
+> Any exceptions thrown from a startup task will be reported in the silo log and will stop the silo. This fail-fast approach helps detect configuration and bootstrap issues during testing rather than having them cause unexpected problems later, but it can also mean that transient failures in a startup task will cause unavailability of the host.
+
+If you need to use the built-in startup task system, you can configure them as follows:
+
+### Register a delegate
+
+A delegate can be registered as a startup task using the appropriate extension method on .
+
+```csharp
+siloBuilder.AddStartupTask(
+ async (IServiceProvider services, CancellationToken cancellation) =>
+ {
+ var grainFactory = services.GetRequiredService();
+ var grain = grainFactory.GetGrain("startup-task-grain");
+ await grain.Initialize();
+ });
```
### Register an `IStartupTask` implementation
-First, we must define an implementation of `IStartupTask`:
+The interface can be implemented and registered as a startup task using the extension method on .
```csharp
public class CallGrainStartupTask : IStartupTask
@@ -50,14 +164,14 @@ public class CallGrainStartupTask : IStartupTask
public async Task Execute(CancellationToken cancellationToken)
{
- var grain = _grainFactory.GetGrain(0);
+ var grain = _grainFactory.GetGrain("startup-task-grain");
await grain.Initialize();
}
}
```
-Then that implementation must be registered with the `ISiloHostBuilder`:
+Register the startup task as follows:
```csharp
-siloHostBuilder.AddStartupTask();
+siloBuilder.AddStartupTask();
```
diff --git a/docs/orleans/implementation/cluster-management.md b/docs/orleans/implementation/cluster-management.md
index 4747013448137..1aa276fa380c1 100644
--- a/docs/orleans/implementation/cluster-management.md
+++ b/docs/orleans/implementation/cluster-management.md
@@ -1,32 +1,34 @@
---
title: Cluster management in Orleans
description: Learn about cluster management in .NET Orleans.
-ms.date: 07/03/2024
+ms.date: 11/23/2024
---
# Cluster management in Orleans
-Orleans provides cluster management via a built-in membership protocol, which we sometimes refer to as **Silo Membership**. The goal of this protocol is for all silos (Orleans servers) to agree on the set of currently alive silos, detect failed silos, and allow new silos to join the cluster.
+Orleans provides cluster management via a built-in membership protocol, which we sometimes refer to as **Cluster membership**. The goal of this protocol is for all silos (Orleans servers) to agree on the set of currently alive silos, detect failed silos, and allow new silos to join the cluster.
-The protocol relies on an external service to provide an abstraction of . `IMembershipTable` is a flat No-SQL-like durable table that we use for two purposes. First, it is used as a rendezvous point for silos to find each other and Orleans clients to find silos. Second, it is used to store the current membership view (list of alive silos) and helps coordinate the agreement on the membership view. We currently have 6 implementations of the `IMembershipTable`: based on [Azure Table Storage](/azure/storage/storage-dotnet-how-to-use-tables), SQL server, [Apache ZooKeeper](https://ZooKeeper.apache.org/), [Consul IO](https://www.consul.io), [AWS DynamoDB](https://aws.amazon.com/dynamodb/), and in-memory emulation for development.
+The protocol relies on an external service to provide an abstraction of . is a flat durable table that we use for two purposes. First, it is used as a rendezvous point for silos to find each other and Orleans clients to find silos. Second, it is used to store the current membership view (list of alive silos) and helps coordinate the agreement on the membership view.
-In addition to the each silo participates in a fully distributed peer-to-peer membership protocol that detects failed silos and reaches an agreement on a set of alive silos. We start by describing the internal implementation of Orleans's membership protocol below and later on describe the implementation of the `IMembershipTable`.
+We currently have 6 implementations of the : based on [Azure Table Storage](/azure/storage/storage-dotnet-how-to-use-tables), [Azure Cosmos DB](https://azure.microsoft.com/services/cosmos-db), ADO.NET (PostgreSQL, MySQL/MariaDB, SQL Server, Oracle), [Apache ZooKeeper](https://ZooKeeper.apache.org/), [Consul IO](https://www.consul.io), [AWS DynamoDB](https://aws.amazon.com/dynamodb/), [MongoDB](https://www.mongodb.com/), [Redis](https://redis.io), [Apache Cassandra](https://cassandra.apache.org), and an in-memory implementation for development.
-### The Basic Membership Protocol
+In addition to the each silo participates in a fully distributed peer-to-peer membership protocol that detects failed silos and reaches an agreement on a set of alive silos. We describing the internal implementation of Orleans's membership protocol below.
-1. Upon startup every silo adds an entry for itself into a well-known, shared table, using an implementation of . A combination of silo identity (`ip:port:epoch`) and service deployment id is used as unique keys in the table. Epoch is just time in ticks when this silo started, and as such `ip:port:epoch` is guaranteed to be unique in a given Orleans deployment.
+### The membership protocol
-1. Silos monitor each other directly, via application pings ("are you alive" `heartbeats`). Pings are sent as direct messages from silo to silo, over the same TCP sockets that silos communicate. That way, pings fully correlate with actual networking problems and server health. Every silo pings a configurable set of other other silos. A silo picks whom to ping by calculating consistent hashes on other silos' identity, forming a virtual ring of all identities, and picking X successor silos on the ring (this is a well-known distributed technique called [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing) and is widely used in many distributed hash tables, like [Chord DHT](https://en.wikipedia.org/wiki/Chord_(peer-to-peer))).
+1. Upon startup every silo adds an entry for itself into a well-known, shared table, using an implementation of . A combination of silo identity (`ip:port:epoch`) and service deployment id (cluster id) is used as unique keys in the table. Epoch is just time in ticks when this silo started, and as such `ip:port:epoch` is guaranteed to be unique in a given Orleans deployment.
-1. If a silo S does not get Y ping replies from a monitored servers P, it suspects it by writing its timestamped suspicion into P's row in the `IMembershipTable`.
+1. Silos monitor each other directly, via application probes ("are you alive" `heartbeats`). probes are sent as direct messages from silo to silo, over the same TCP sockets that silos communicate. That way, probes fully correlate with actual networking problems and server health. Every silo probes a configurable set of other silos. A silo picks whom to probe by calculating consistent hashes on other silos' identity, forming a virtual ring of all identities, and picking X successor silos on the ring (this is a well-known distributed technique called [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing) and is widely used in many distributed hash tables, like [Chord DHT](https://en.wikipedia.org/wiki/Chord_(peer-to-peer))).
-1. If P has more than Z suspicions within K seconds, then S writes that P is dead into P's row, and broadcasts a request for all silos to re-read the membership table (which they'll do anyway periodically).
+1. If a silo S does not get Y probe replies from a monitored server P, it suspects it by writing its timestamped suspicion into P's row in the .
+
+1. If P has more than Z suspicions within K seconds, then S writes that P is dead into P's row and sends a snapshot of the current membership table to all other silos. Silos refresh the table periodically, so the snapshot is an optimization to reduce the time taken for all silos to learn about the new membership view.
1. In more details:
- 1. Suspicion is written to the `IMembershipTable`, in a special column in the row corresponding to P. When S suspects P it writes: "at time TTT S suspected P".
+ 1. Suspicion is written to the , in a special column in the row corresponding to P. When S suspects P it writes: "at time TTT S suspected P".
- 1. One suspicion is not enough to declare P as dead. You need Z suspicions from different silos in a configurable time window T, typically 3 minutes, to declare P as dead. The suspicion is written using optimistic concurrency control provided by the `IMembershipTable`.
+ 1. One suspicion is not enough to declare P as dead. You need Z suspicions from different silos in a configurable time window T, typically 3 minutes, to declare P as dead. The suspicion is written using optimistic concurrency control provided by the .
1. The suspecting silo S reads P's row.
@@ -36,148 +38,141 @@ In addition to the each silo participates in a f
1. In either case the write-back uses the version number or ETag that was read, so the updates to this row are serialized. In case the write has failed due to version/ETag mismatch, S retries (read again, and try to write, unless P was already marked dead).
- 1. At a high level this sequence of "read, local modify, write back" is a transaction. However, we are not using storage transactions to do that. "Transaction" code executes locally on a server and we use optimistic concurrency provided by the `IMembershipTable` to ensure isolation and atomicity.
+ 1. At a high level this sequence of "read, local modify, write back" is a transaction. However, we are not necessarily using storage transactions to do that. "Transaction" code executes locally on a server and we use optimistic concurrency provided by the to ensure isolation and atomicity.
1. Every silo periodically reads the entire membership table for its deployment. That way silos learn about new silos joining and about other silos being declared dead.
-1. **Configuration**: we provide a default configuration, which was hand-tuned during our production usage in Azure. Currently, the default is: every silo is monitored by three other silos, two suspicions are enough to declare a silo dead, suspicions only from the last three minutes (otherwise they are outdated). Pings are sent every ten seconds and you'd need to miss three pings to suspect a silo.
+1. **Snapshot broadcast**: To reduce the frequency of periodical table reads, every time a silo writes to the table (suspicion, new join, etc.) it sends a snapshot of the current table state to all other silos. Since the membership table is consistent and monotonically versioned, each update produces a uniquely versioned snapshot that can be safely shared. This enables immediate propagation of membership changes without waiting for the periodic read cycle. The periodic read is still maintained as a fallback mechanism in case snapshot distribution fails.
-1. **Enforcing Perfect Failure detection** – it is theoretically possible that a silo will be declared dead if it lost communication with other silos, while the silo process itself is still running. To solve this problem once the silo is declared dead in the table it is considered dead by everyone, even if it is not dead (just partitioned temporarily or heartbeat messages got lost). Everyone stops communicating with it and once it learns that it is dead (by reading its new status from the table) it commits suicide and shuts down its process. As a result, there must be an infrastructure in place to restart the silo as a new process (a new epoch number is generated upon start). When it's hosted in Azure, that happens automatically. When it isn't, another infrastructure is required. For example, a Windows Service configured to auto restart on failure or a Kubernetes deployment.
+1. **Ordered membership views**: The membership protocol ensures that all membership configurations are globally totally ordered. This ordering provides two key benefits:
-1. **Optimization to reduce the frequency of periodical table reads and speed up all silos learning about new joining silos and dead silos**. Every time any silo writes anything successfully to the table (suspicion, new join, and so on) it also broadcasts to all other silos – "go and reread the table now". The silo does NOT tell others what it wrote in the table (since this information could already be outdated/wrong), it just tells them to re-read the table. That way we learn very quickly about membership changes without the need to wait for the full periodic read cycle. We still need the periodic read, in case the "re-read the table" message gets lost.
+ 1. **Guaranteed connectivity**: When a new silo joins the cluster, it must validate two-way connectivity to every other active silo. If any existing silo does not respond (potentially indicating a network connectivity problem), the new silo is not allowed to join. This ensures full connectivity between all silos in the cluster at startup time. See the note about IAmAlive below for an exception in the case of disaster recovery.
-### Properties of the basic membership protocol
+ 2. **Consistent directory updates**: Higher level protocols, such as the distributed grain directory, rely on all silos having a consistent, monotonic view of membership. This enables smarter resolution of duplicate grain activations. For more details, see the [grain directory](grain-directory.md) documentation.
-1. **Can handle any number of failures**:
+ **Implementation details**:
- Our algorithm can handle any number of failures (that is, f<=n), including full cluster restart. This is in contrast with "traditional" [Paxos](https://en.wikipedia.org/wiki/Paxos_(computer_science)) based solutions, which require a quorum, which is usually a majority. We have seen in production situations when more than half of the silos were down. Our system stayed functional, while Paxos-based membership would not be able to make progress.
+ 1. The requires atomic updates to guarantee a global total order of changes:
+ - Implementations must update both the table entries (list of silos) and version number atomically
+ - This can be achieved using database transactions (as in SQL Server) or atomic compare-and-swap operations using ETags (as in Azure Table Storage)
+ - The specific mechanism depends on the capabilities of the underlying storage system
-1. **Traffic to the table is very light**:
+ 2. A special membership-version row in the table tracks changes:
+ - Every write to the table (suspicions, death declarations, joins) increments this version number
+ - All writes are serialized through this row using atomic updates
+ - The monotonically increasing version ensures a total ordering of all membership changes
- The actual pings go directly between servers and not to the table. This would generate a lot of traffic plus would be less accurate from the failure detection perspective - if a silo could not reach the table, it would miss writing its I am alive heartbeat, and others would kill him.
+ 3. When silo S updates the status of silo P:
+ - S first reads the latest table state
+ - In a single atomic operation, it updates both P's row and increments the version number
+ - If the atomic update fails (e.g., due to concurrent modifications), the operation is retried with exponential backoff
-1. **Tunable accuracy versus completeness**:
+ **Scalability considerations**:
- While you cannot achieve both perfect and accurate failure detection, one usually wants an ability to tradeoff accuracy (don't want to declare a silo that is alive as dead) with completeness (want to declare dead a silo that is indeed dead as soon as possible). The configurable votes to declare dead and missed pings allow trading those two. For more information, see [Yale University: Computer Science Failure Detectors](https://www.cs.yale.edu/homes/aspnes/pinewiki/FailureDetectors.html).
+ Serializing all writes through the version row can impact scalability due to increased contention. The protocol has been proven in production with up to 200 silos, but may face challenges beyond a thousand silos. For very large deployments, other parts of Orleans (messaging, grain directory, hosting) remain scalable even if membership updates become a bottleneck.
-1. **Scale**:
+1. **Default configuration**: The default configuration has been hand-tuned during production usage in Azure. By default: every silo is monitored by three other silos, two suspicions are enough to declare a silo dead, suspicions only from the last three minutes (otherwise they are outdated). probes are sent every ten seconds and you'd need to miss three probes to suspect a silo.
- The basic protocol can handle thousands and probably even tens of thousands of servers. This is in contrast with traditional Paxos-based solutions, such as group communication protocols, which are known not to scale beyond tens.
+1. **Self-monitoring**: The fault detector incorporates ideas from Hashicorp's _Lifeguard_ research ([paper](https://arxiv.org/abs/1707.00788), [talk](https://www.youtube.com/watch?v=u-a7rVJ6jZY), [blog](https://www.hashicorp.com/blog/making-gossip-more-robust-with-lifeguard)) to improve cluster stability during catastrophic events where a large portion of the cluster experiences partial failure. The `LocalSiloHealthMonitor` component scores each silo's health using multiple heuristics:
-1. **Diagnostics**:
+ * Active status in membership table
+ * No suspicions from other silos
+ * Recent successful probe responses
+ * Recent probe requests received
+ * Thread pool responsiveness (work items executing within 1 second)
+ * Timer accuracy (firing within 3 seconds of schedule)
- The table is also very convenient for diagnostics and troubleshooting. The system administrators can instantaneously find in the table the current list of alive silos, as well as see the history of all killed silos and suspicions. This is especially useful when diagnosing problems.
+ A silo's health score affects its probe timeouts: unhealthy silos (scoring 1-8) have increased timeouts compared to healthy silos (score 0). This has two benefits:
+ * Gives more time for probes to succeed when the network or system is under stress
+ * Makes it more likely that unhealthy silos will be voted dead before they can incorrectly vote out healthy silos
-1. **Why do we need reliable persistent storage for implementation of the `IMembershipTable`**:
+ This is particularly valuable during scenarios like thread pool starvation, where slow nodes might otherwise incorrectly suspect healthy nodes simply because they cannot process responses quickly enough.
- We use persistent storage (Azure table, SQL Server, AWS DynamoDB, Apache ZooKeeper, or Consul IO KV) for the `IMembershipTable` for two purposes. First, it is used as a rendezvous point for silos to find each other and Orleans clients to find silos. Second, we use reliable storage to help us coordinate the agreement on the membership view. While we perform failure detection directly in a peer-to-peer fashion between the silos, we store the membership view in reliable storage and use the concurrency control mechanism provided by this storage to reach an agreement of who is alive and who is dead. That way, in a sense, our protocol outsources the hard problem of distributed consensus to the cloud. In that we fully utilize the power of the underlying cloud platform, using it truly as Platform as a Service (PaaS).
+1. **Indirect probing**: Another [Lifeguard](https://arxiv.org/abs/1707.00788)-inspired feature that improves failure detection accuracy by reducing the chance that an unhealthy or partitioned silo will incorrectly declare a healthy silo dead. When a monitoring silo has two probe attempts remaining for a target silo before casting a vote to declare it dead, it employs indirect probing:
-1. **What happens if the table is not accessible for some time**:
+ * The monitoring silo randomly selects another silo as an intermediary and asks it to probe the target
+ * The intermediary attempts to contact the target silo
+ * If the target fails to respond within the timeout period, the intermediary sends a negative acknowledgement
+ * If the monitoring silo received a negative acknowledgement from the intermediary and the intermediary declares itself healthy (through self-monitoring, described above), the monitoring silo casts a vote to declare the target dead
+ * With the default configuration of two required votes, a negative acknowledgement from an indirect probe counts as both votes, allowing faster declaration of dead silos when the failure is confirmed by multiple perspectives
- When the storage service is down, unavailable, or there are communication problems with it, the Orleans protocol does NOT declare silos as dead by mistake. Operational silos will keep working without any problems. However, Orleans won't be able to declare a silo dead (if it detects some silo is dead via missed pings, it won't be able to write this fact to the table) and also won't be able to allow new silos to join. So completeness will suffer, but accuracy will not—partitioning from the table will never cause Orleans to declare silo as dead by mistake. Also, in case of a partial network partition (if some silos can access the table and some not), it could happen that Orleans will declare a dead silo as dead, but it will take some time until all other silos learn about it. So detection could be delayed, but Orleans will never wrongly kill a silo due to table unavailability.
+1. **Enforcing perfect failure detection**: Once a silo is declared dead in the table, it is considered dead by everyone, even if it is not dead (just partitioned temporarily or heartbeat messages got lost). Everyone stops communicating with it and once it learns that it is dead (by reading its new status from the table) it commits suicide and shuts down its process. As a result, there must be an infrastructure in place to restart the silo as a new process (a new epoch number is generated upon start). When it's hosted in Azure, that happens automatically. When it isn't, another infrastructure is required, such as a Windows Service configured to auto restart on failure or a Kubernetes deployment.
-1. **Direct IAmAlive writes into the table for diagnostics only**:
-
- In addition to heartbeats that are sent between the silos, each silo also periodically updates an "I Am Alive" column in his row in the table. This "I Am Alive" column is only used **for manual troubleshooting and diagnostics** and is not used by the membership protocol itself. It is usually written at a much lower frequency (once every 5 minutes) and serves as a very useful tool for system administrators to check the liveness of the cluster or easily find out when the silo was last alive.
+1. **What happens if the table is not accessible for some time**:
-### Extension to order membership views
+ When the storage service is down, unavailable, or there are communication problems with it, the Orleans protocol does NOT declare silos as dead by mistake. Operational silos will keep working without any problems. However, Orleans won't be able to declare a silo dead (if it detects some silo is dead via missed probes, it won't be able to write this fact to the table) and also won't be able to allow new silos to join. So completeness will suffer, but accuracy will not — partitioning from the table will never cause Orleans to declare silo as dead by mistake. Also, in case of a partial network partition (if some silos can access the table and some not), it could happen that Orleans will declare a dead silo as dead, but it will take some time until all other silos learn about it. So detection could be delayed, but Orleans will never wrongly kill a silo due to table unavailability.
-The basic membership protocol described above was later extended to support ordered membership views. We will briefly describe the reasons for this extension and how it is implemented. The extension does not change anything in the above design, just adds property that all membership configurations are globally totally ordered.
+1. **IAmAlive writes for diagnostics and disaster recovery**:
-**Why it is useful to order membership views?**
+ In addition to heartbeats that are sent between the silos, each silo periodically updates an "I Am Alive" timestamp in its row in the table. This serves two purposes:
+ 1. For diagnostics, it provides system administrators with a simple way to check cluster liveness and determine when a silo was last active. The timestamp is typically updated every 5 minutes.
+ 2. For disaster recovery, if a silo has not updated its timestamp for several periods (configured via `NumMissedTableIAmAliveLimit`), new silos will ignore it during startup connectivity checks, allowing the cluster to recover from scenarios where silos crashed without proper cleanup.
-* This allows serializing the joining of new silos to the cluster. That way, when a new silo joins the cluster it can validate two-way connectivity to every other silo that has already started. If some of them already joined silos do not answer it (potentially indicating a network connectivity problem with the new silo), the new silo is not allowed to join. This ensures that at least when a silo starts, there is full connectivity between all silos in the cluster (this is implemented).
+### Membership table
-* Higher level protocols in the silo, such as distributed grain directory, can utilize the fact that membership views are ordered and use this information to perform smarter duplicate activations resolution. In particular, when the directory finds out that 2 activations were created when membership was in flux, it may decide to deactivate the older activation that was created based on the now-outdated membership information.
+As already mentioned, is used as a rendezvous point for silos to find each other and Orleans clients to find silos and also helps coordinate the agreement on the membership view. The main Orleans repository contains implementations for many systems, such as Azure Table Storage, Azure Cosmos DB, PostgreSQL, MySQL/MariaDB, SQL server, Apache ZooKeeper, Consul IO, Apache Cassandra, MongoDB, Redis, AWS DynamoDB, and an in-memory implementation for development.
-**Extended membership protocol:**
+1. [Azure Table Storage](/azure/storage/storage-dotnet-how-to-use-tables) - in this implementation we use Azure deployment ID as partition key and the silo identity (`ip:port:epoch`) as row key. Together they guarantee a unique key per silo. For concurrency control, we use optimistic concurrency control based on [Azure Table ETags](/rest/api/storageservices/Update-Entity2). Every time we read from the table we store the ETag for every read row and use that ETag when we try to write back. ETags are automatically assigned and checked by the Azure Table service on every write. For multi-row transactions, we utilize the support for [batch transactions provided by Azure table](/rest/api/storageservices/Performing-Entity-Group-Transactions), which guarantees serializable transactions over rows with the same partition key.
-1. For the implementation of this feature we utilize the support for transactions over multiple rows that are provided by the `MembershipTable`.
-1. We add a membership-version row to the table that tracks table changes.
-1. When silo S wants to write suspicion or death declaration for silo P:
+1. SQL Server - in this implementation, the configured deployment ID is used to distinguish between deployments and which silos belong to which deployments. The silo identity is defined as a combination of `deploymentID, ip, port, epoch` in appropriate tables and columns. The relational backend uses optimistic concurrency control and transactions, similar to the procedure of using ETags on Azure Table implementation. The relational implementation expects the database engine to generate the ETag used. In the case of SQL Server, on SQL Server 2000 the generated ETag is one acquired from a call to `NEWID()`. On SQL Server 2005 and later [ROWVERSION](/sql/t-sql/data-types/rowversion-transact-sql) is used. Orleans reads and writes relational ETags as opaque `VARBINARY(16)` tags and stores them in memory as [base64]( https://en.wikipedia.org/wiki/Base64) encoded strings. Orleans supports multi-row inserts using `UNION ALL` (for Oracle including DUAL), which is currently used to insert statistics data. The exact implementation and rationale for SQL Server can be seen at [CreateOrleansTables_SqlServer.sql](https://github.com/dotnet/orleans/blob/ba30bbb2155168fc4b9f190727220583b9a7ae4c/src/OrleansSQLUtils/CreateOrleansTables_SqlServer.sql).
- 1. S reads the latest table content. If P is already dead, do nothing. Otherwise,
- 1. In the same transaction, write the changes to P's row as well as increment the version number and write it back to the table.
- 1. Both writes are conditioned with ETags.
- 1. If the transaction aborts due to ETag mismatch on either P's row or the version row, attempt again.
+1. [Apache ZooKeeper](https://zookeeper.apache.org/) - in this implementation we use the configured deployment ID as a root node and the silo identity (`ip:port@epoch`) as its child node. Together they guarantee a unique path per silo. For concurrency control, we use optimistic concurrency control based on the [node version](https://zookeeper.apache.org/doc/r3.4.6/zookeeperOver.html#Nodes+and+ephemeral+nodes). Every time we read from the deployment root node we store the version for every read child silo node and use that version when we try to write back. Each time a node's data changes, the version number increases atomically by the ZooKeeper service. For multi-row transactions, we utilize the [multi method](https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/zookeeper/ZooKeeper.html#multi(java.lang.Iterable)), which guarantees serializable transactions over silo nodes with the same parent deployment ID node.
-1. All writes to the table modify and increment the version row. That way all writes to the table is serialized (via serializing the updates to the version row) and since silos only increment the version number, the writes are also totally ordered in increasing order.
+1. [Consul IO](https://www.consul.io) - we used [Consul's Key/Value store](https://www.consul.io/intro/getting-started/kv.html) to implement the membership table. Refer to [Consul-Deployment](../deployment/consul-deployment.md) for more details.
-**Scalability of the extended membership protocol:**
+1. [AWS DynamoDB](https://aws.amazon.com/dynamodb/) - In this implementation, we use the cluster Deployment ID as the Partition Key and Silo Identity (`ip-port-generation`) as the RangeKey making the record unity. The optimistic concurrency is made by the `ETag` attribute by making conditional writes on DynamoDB. The implementation logic is quite similar to Azure Table Storage.
-In the extended version of the protocol, all writes are serialized via one row. This can potentially hurt the scalability of the cluster management protocol since it increases the risk of conflicts between concurrent table writes. To partially mitigate this problem silos retry all their writes to the table by using exponential backoff. We have observed the extended protocols to work smoothly in a production environment in Azure with up to 200 silos. However, we do think the protocol might have problems scaling beyond a thousand silos. In such large setups, the updates to the version row may be easily disabled, essentially maintaining the rest of the cluster management protocol and giving up on the total ordering property. Please also note that we refer here to the scalability of the cluster management protocol, not the rest of Orleans. We believe that other parts of the Orleans runtime (messaging, distributed directory, grain hosting, client to gateway connectivity) are scalable way beyond hundreds of silos.
+1. [Apacha Cassandra](https://cassandra.apache.org/_/index.html) - In this implementation we use the composite of Service Id and Cluster Id as partition key and the silo identity (`ip:port:epoch`) as row key. Together they guarantee a unique row per silo. For concurrency control, we use optimistic concurrency control based on a static column version using a Lightweight Transaction. This version column is shared for all rows in the partition/cluster so provides the consistent incrementing version number for each cluster's membership table. There are no multi-row transactions in this implementation.
-### Membership table
+1. In-memory emulation for development setup. We use a special system grain for that implementation. This grain lives on a designated primary silo, which is only used for a **development setup**. In any real production usage primary silo **is not required**.
-As already mentioned, `IMembershipTable` is used as a rendezvous point for silos to find each other and Orleans clients to find silos and also helps coordinate the agreement on the membership view. We currently have six implementations of the `IMembershipTable`: based on Azure Table, SQL server, Apache ZooKeeper, Consul IO, AWS DynamoDB, and in-memory emulation for development.
+### Design rationale
-1. [Azure Table Storage](/azure/storage/storage-dotnet-how-to-use-tables) - in this implementation we use Azure deployment ID as partition key and the silo identity (`ip:port:epoch`) as row key. Together they guarantee a unique key per silo. For concurrency control, we use optimistic concurrency control based on [Azure Table ETags](/rest/api/storageservices/Update-Entity2). Every time we read from the table we store the ETag for every read row and use that ETag when we try to write back. ETags are automatically assigned and checked by the Azure Table service on every write. For multi-row transactions, we utilize the support for [batch transactions provided by Azure table](/rest/api/storageservices/Performing-Entity-Group-Transactions), which guarantees serializable transactions over rows with the same partition key.
+A natural question that might be asked is why not rely completely on [Apache ZooKeeper](https://ZooKeeper.apache.org/) or [etcd](https://etcd.io/) for the cluster membership implementation, potentially by using ZooKeeper's out-of-the-box support for group membership with ephemeral nodes? Why did we bother implementing our membership protocol? There were primarily three reasons:
-1. SQL Server - in this implementation, the configured deployment ID is used to distinguish between deployments and which silos belong to which deployments. The silo identity is defined as a combination of `deploymentID, ip, port, epoch` in appropriate tables and columns. The relational backend uses optimistic concurrency control and transactions, similar to the procedure of using ETags on Azure Table implementation. The relational implementation expects the database engine to generate the ETag used. In the case of SQL Server, on SQL Server 2000 the generated ETag is one acquired from a call to `NEWID()`. On SQL Server 2005 and later [ROWVERSION](/sql/t-sql/data-types/rowversion-transact-sql) is used. Orleans reads and writes relational ETags as opaque `VARBINARY(16)` tags and stores them in memory as [base64]( https://en.wikipedia.org/wiki/Base64) encoded strings. Orleans supports multi-row inserts using `UNION ALL` (for Oracle including DUAL), which is currently used to insert statistics data. The exact implementation and rationale for SQL Server can be seen at [CreateOrleansTables_SqlServer.sql](https://github.com/dotnet/orleans/blob/ba30bbb2155168fc4b9f190727220583b9a7ae4c/src/OrleansSQLUtils/CreateOrleansTables_SqlServer.sql).
+1. **Deployment/Hosting in the cloud**:
-1. [Apache ZooKeeper](https://ZooKeeper.apache.org/) - in this implementation we use the configured deployment ID as a root node and the silo identity (`ip:port@epoch`) as its child node. Together they guarantee a unique path per silo. For concurrency control, we use optimistic concurrency control based on the [node version](https://zookeeper.apache.org/doc/r3.4.6/zookeeperOver.html#Nodes+and+ephemeral+nodes). Every time we read from the deployment root node we store the version for every read child silo node and use that version when we try to write back. Each time a node's data changes, the version number increases atomically by the ZooKeeper service. For multi-row transactions, we utilize the [multi method](https://zookeeper.apache.org/doc/r3.4.6/api/org/apache/zookeeper/ZooKeeper.html#multi(java.lang.Iterable)), which guarantees serializable transactions over silo nodes with the same parent deployment ID node.
+ Zookeeper is not a hosted service. It means that in the Cloud environment Orleans customers would have to deploy/run/manage their instance of a ZK cluster. This is just yet another unnecessary burden, that we did not want to force on our customers. By using Azure Table we rely on a hosted, managed service which makes our customer's lives much simpler. _Basically, in the Cloud, use Cloud as a Platform, not as an Infrastructure._ On the other hand, when running on-premises and managing your servers, relying on ZK as an implementation of the is a viable option.
-1. [Consul IO](https://www.consul.io) - we used [Consul's Key/Value store](https://www.consul.io/intro/getting-started/kv.html) to implement the membership table. Refer to [Consul-Deployment](../deployment/consul-deployment.md) for more details.
+1. **Direct failure detection**:
-1. [AWS DynamoDB](https://aws.amazon.com/dynamodb/) - In this implementation, we use the cluster Deployment ID as the Partition Key and Silo Identity (`ip-port-generation`) as the RangeKey making the record unity. The optimistic concurrency is made by the `ETag` attribute by making conditional writes on DynamoDB. The implementation logic is quite similar to Azure Table Storage.
+ When using ZK's group membership with ephemeral nodes the failure detection is performed between the Orleans servers (ZK clients) and ZK servers. This may not necessarily correlate with the actual network problems between Orleans servers. _Our desire was that the failure detection would accurately reflect the intra-cluster state of the communication._ Specifically, in our design, if an Orleans silo cannot communicate with the it is not considered dead and can keep working. As opposed to that, have we used ZK group membership with ephemeral nodes a disconnection from a ZK server may cause an Orleans silo (ZK client) to be declared dead, while it may be alive and fully functional.
-1. In-memory emulation for development setup. We use a special system grain, called , for that implementation. This grain lives on a designated primary silo, which is only used for a **development setup**. In any real production usage primary silo **is not required**.
+1. **Portability and flexibility**:
-### Configuration
+ As part of Orleans's philosophy, we do not want to force a strong dependence on any particular technology, but rather have a flexible design where different components can be easily switched with different implementations. This is exactly the purpose that abstraction serves.
-The membership protocol is configured via the `Liveness` element in the `Globals` section in _OrleansConfiguration.xml_ file. The default values were tuned in years of production usage in Azure and we believe they represent good default settings. There is no need in general to change them.
+### Properties of the membership protocol
-Sample config element:
+1. **Can handle any number of failures**:
-```xml
-
-```
+ Our algorithm can handle any number of failures (that is, f<=n), including full cluster restart. This is in contrast with "traditional" [Paxos](https://en.wikipedia.org/wiki/Paxos_(computer_science)) based solutions, which require a quorum, which is usually a majority. We have seen in production situations when more than half of the silos were down. Our system stayed functional, while Paxos-based membership would not be able to make progress.
-There are 4 types of liveness implemented. The type of the liveness protocol is configured via the `SystemStoreType` attribute of the `SystemStore` element in the `Globals` section in _OrleansConfiguration.xml_ file.
+1. **Traffic to the table is very light**:
-1. `MembershipTableGrain`: membership table is stored in a grain on the primary silo. This is a **development setup only**.
-1. `AzureTable`: membership table is stored in Azure table.
-1. `SqlServer`: membership table is stored in a relational database.
-1. `ZooKeeper`: membership table is stored in a ZooKeeper [ensemble](https://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup).
-1. `Consul`: configured as Custom system store with `MembershipTableAssembly = "OrleansConsulUtils"`. Refer to [Consul-Deployment](../deployment/consul-deployment.md) for more details.
-1. `DynamoDB`: configured as a Custom system store with `MembershipTableAssembly = "OrleansAWSUtils"`.
+ The actual probes go directly between servers and not to the table. This would generate a lot of traffic plus would be less accurate from the failure detection perspective - if a silo could not reach the table, it would miss writing its I am alive heartbeat, and others would kill him.
-For all liveness types the common configuration variables are defined in `Globals.Liveness` element:
+1. **Tunable accuracy versus completeness**:
-1. `ProbeTimeout`: The number of seconds to probe other silos for their liveness or for the silo to send "I am alive" heartbeat messages about itself. Default is 10 seconds.
-1. `TableRefreshTimeout`: The number of seconds to fetch updates from the membership table. Default is 60 seconds.
-1. `DeathVoteExpirationTimeout`: Expiration time in seconds for death vote in the membership table. Default is 120 seconds
-1. `NumMissedProbesLimit`: The number of missed "I am alive" heartbeat messages from a silo or number of un-replied probes that lead to suspecting this silo as dead. Default is 3.
-1. `NumProbedSilos`: The number of silos each silo probes for liveness. Default is 3.
-1. `NumVotesForDeathDeclaration`: The number of non-expired votes that are needed to declare some silo as dead (should be at most NumMissedProbesLimit). Default is 2.
-1. `UseLivenessGossip`: Whether to use the gossip optimization to speed up spreading liveness information. Default is true.
-1. `IAmAliveTablePublishTimeout`: The number of seconds to periodically write in the membership table that this silo is alive. Used only for diagnostics. Default is 5 minutes.
-1. `NumMissedTableIAmAliveLimit`: The number of missed "I am alive" updates in the table from a silo that causes a warning to be logged. Does not impact the liveness protocol. Default is 2.
-1. `MaxJoinAttemptTime`: The number of seconds to attempt to join a cluster of silos before giving up. Default is 5 minutes.
-1. `ExpectedClusterSize`: The expected size of a cluster. Need not be very accurate, can be an overestimate. Used to tune the exponential backoff algorithm of retries to write to Azure table. Default is 20.
+ While you cannot achieve both perfect and accurate failure detection, one usually wants an ability to tradeoff accuracy (don't want to declare a silo that is alive as dead) with completeness (want to declare dead a silo that is indeed dead as soon as possible). The configurable votes to declare dead and missed probes allow trading those two. For more information, see [Yale University: Computer Science Failure Detectors](https://www.cs.yale.edu/homes/aspnes/pinewiki/FailureDetectors.html).
-### Design rationale
+1. **Scale**:
-A natural question that might be asked is why not rely completely on [Apache ZooKeeper](https://ZooKeeper.apache.org/) for the cluster membership implementation, potentially by using its out-of-the-box support for group membership with ephemeral nodes? Why did we bother implementing our membership protocol? There were primarily three reasons:
+ The protocol can handle thousands and probably even tens of thousands of servers. This is in contrast with traditional Paxos-based solutions, such as group communication protocols, which are known not to scale beyond tens.
-1. **Deployment/Hosting in the cloud**:
+1. **Diagnostics**:
- Zookeeper is not a hosted service (at least at the time of this writing July 2015 and definitely when we first implemented this protocol in the summer of 2011 there was no version of Zookeeper running as a hosted service by any major cloud provider). It means that in the Cloud environment Orleans customers would have to deploy/run/manage their instance of a ZK cluster. This is just yet another unnecessary burden, that we did not want to force on our customers. By using Azure Table we rely on a hosted, managed service which makes our customer's lives much simpler. _Basically, in the Cloud, use Cloud as a Platform, not as an Infrastructure._ On the other hand, when running on-premises and managing your servers, relying on ZK as an implementation of the `IMembershipTable` is a viable option.
+ The table is also very convenient for diagnostics and troubleshooting. The system administrators can instantaneously find in the table the current list of alive silos, as well as see the history of all killed silos and suspicions. This is especially useful when diagnosing problems.
-1. **Direct failure detection**:
+1. **Why do we need reliable persistent storage for implementation of the **:
- When using ZK's group membership with ephemeral nodes the failure detection is performed between the Orleans servers (ZK clients) and ZK servers. This may not necessarily correlate with the actual network problems between Orleans servers. _Our desire was that the failure detection would accurately reflect the intra-cluster state of the communication._ Specifically, in our design, if an Orleans silo cannot communicate with the `IMembershipTable` it is not considered dead and can keep working. As opposed to that, have we used ZK group membership with ephemeral nodes a disconnection from a ZK server may cause an Orleans silo (ZK client) to be declared dead, while it may be alive and fully functional.
+ We use persistent storage for the for two purposes. First, it is used as a rendezvous point for silos to find each other and Orleans clients to find silos. Second, we use reliable storage to help us coordinate the agreement on the membership view. While we perform failure detection directly in a peer-to-peer fashion between the silos, we store the membership view in reliable storage and use the concurrency control mechanism provided by this storage to reach an agreement of who is alive and who is dead. That way, in a sense, our protocol outsources the hard problem of distributed consensus to the cloud. In that we fully utilize the power of the underlying cloud platform, using it truly as Platform as a Service (PaaS).
-1. **Portability and flexibility**:
+1. **Direct IAmAlive writes into the table for diagnostics only**:
- As part of Orleans's philosophy, we do not want to force a strong dependence on any particular technology, but rather have a flexible design where different components can be easily switched with different implementations. This is exactly the purpose that `IMembershipTable` abstraction serves.
+ In addition to heartbeats that are sent between the silos, each silo also periodically updates an "I Am Alive" column in his row in the table. This "I Am Alive" column is only used **for manual troubleshooting and diagnostics** and is not used by the membership protocol itself. It is usually written at a much lower frequency (once every 5 minutes) and serves as a very useful tool for system administrators to check the liveness of the cluster or easily find out when the silo was last alive.
### Acknowledgements
-We would to acknowledge the contribution of Alex Kogan to the design and implementation of the first version of this protocol. This work was done as part of a summer internship in Microsoft Research in the Summer of 2011.
-The implementation of ZooKeeper based `IMembershipTable` was done by [Shay Hazor](https://github.com/shayhatsor), the implementation of SQL `IMembershipTable` was done by [Veikko Eeva](https://github.com/veikkoeeva), the implementation of AWS DynamoDB `IMembershipTable` was done by [Gutemberg Ribeiro](https://github.com/galvesribeiro/) and the implementation of Consul based `IMembershipTable` was done by [Paul North](https://github.com/PaulNorth).
+We would like to acknowledge the contribution of Alex Kogan to the design and implementation of the first version of this protocol. This work was done as part of a summer internship in Microsoft Research in the Summer of 2011.
+The implementation of ZooKeeper based was done by [Shay Hazor](https://github.com/shayhatsor), the implementation of SQL was done by [Veikko Eeva](https://github.com/veikkoeeva), the implementation of AWS DynamoDB was done by [Gutemberg Ribeiro](https://github.com/galvesribeiro/) and the implementation of Consul based was done by [Paul North](https://github.com/PaulNorth), and finally the implementation of the Apache Cassandra was adapted from `OrleansCassandraUtils` by [Arshia001](https://github.com/Arshia001).
diff --git a/docs/orleans/implementation/grain-directory.md b/docs/orleans/implementation/grain-directory.md
new file mode 100644
index 0000000000000..36613770d25bb
--- /dev/null
+++ b/docs/orleans/implementation/grain-directory.md
@@ -0,0 +1,78 @@
+---
+title: Grain Directory Implementation
+description: Explore the implementation of the grain directory in .NET Orleans.
+ms.date: 11/22/2024
+---
+
+# Grain directory implementation
+
+## Overview and architecture
+
+The grain directory in Orleans is a key-value store where the key is a grain identifier and the value is a registration entry which points to an active silo which (potentially) hosts the grain.
+
+While Orleans provides a default in-memory distributed directory implementation (described in this article), the grain directory system is designed to be pluggable. Developers can implement their own directory by implementing the `IGrainDirectory` interface and registering it with the silo's service collection. This allows for custom directory implementations that might use different storage backends or consistency models to better suit specific application requirements. Since the introduction of the new strong consistency directory, there is little need for external directory implementations, but the API remains for backward compatibility and flexibility. The grain directory can be configured on a per-grain-type basis.
+
+To optimize performance, directory lookups are cached locally within each silo. This means that potentially-remote directory reads are only necessary when the local cache entry is either missing or invalid. This caching mechanism reduces the network overhead and latency associated with grain location lookups.
+
+Originally, Orleans implemented an eventually consistent directory structured as a distributed hash table. This was superseded by a strongly consistent directory in Orleans v9.0, based on the two-phase [Virtually Synchronous methodology](https://www.microsoft.com/en-us/research/publication/virtually-synchronous-methodology-for-dynamic-service-replication/) and also structured as distributed hash table but with improved load balancing through the use of virtual nodes. This article describes the latter, newer grain directory implementation.
+
+## Distributed grain directory
+
+The distributed grain directory in Orleans offers strong consistency, even load balancing, high performance, and fault tolerance. The implementation follows a two-phase design based on the [Virtual Synchrony methodology](https://www.microsoft.com/en-us/research/publication/virtually-synchronous-methodology-for-dynamic-service-replication/) with similarities to [Vertical Paxos](https://www.microsoft.com/en-us/research/publication/vertical-paxos-and-primary-backup-replication/).
+
+Directory partitions have two modes of operation:
+
+1. Normal operation: partitions process requests locally without coordination with other hosts.
+1. View change: hosts coordinate with each other to transfer ownership of directory ranges.
+
+The directory leverages Orleans' strong consistency cluster membership system, where configurations called "views" have monotonically increasing version numbers. As silos join and leave the cluster, successive views are created, resulting in changes to range ownership.
+
+All directory operations include view coordination:
+
+- Requests carry the caller's view number.
+- Responses include the partition's view number.
+- View number mismatches trigger synchronization.
+- Requests are automatically retried on view changes.
+
+This ensures that all requests are processed by the correct owner of the directory partition.
+
+### Partitioning strategy
+
+The directory is partitioned using a consistent hash ring with ranges being assigned to the active silos in the cluster. Grain identifiers are hashed to find the silo which owns the section of the ring corresponding to its hash.
+
+Each active silo owns a pre-configured number of ranges, defaulting to 30 ranges per silo. This is similar to the scheme used by [Amazon Dynamo](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) and [Apache Cassandra](https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archDataDistributeVnodesUsing.html), where multiple "virtual nodes" (ranges) are created for each node (host).
+
+The size of a partition is determined by the distance between its hash and the hash of the next partition. It's possible for a range to be split among multiple silos during a view change, which adds complexity to the view change procedure since each partition must potentially coordinate with multiple other partitions.
+
+### View change procedure
+
+Directory partitions (implemented in `GrainDirectoryPartition`) use versioned range locks to prevent invalid access to ranges during view changes. Range locks are created during view change and are released when the view change is complete. These locks are analogous to the 'wedges' used in the Virtual Synchrony methodology.
+
+When a view change occurs, a partition can either grow or shrink:
+
+- If a new silo has joined the cluster, then existing partitions may shrink to make room.
+- If a silo has left the cluster, then remaining partitions may grow to take over the orphaned ranges.
+
+Directory registrations must be transferred from the old owner to the new owner before requests can be served.
+The transfer process follows these steps:
+
+1. The previous owner seals the range and creates a snapshot of its directory entries.
+1. The new owner requests and applies the snapshot.
+1. The new owner begins servicing requests for the range.
+1. The previous owner is notified and deletes the snapshot.
+
+### Recovery process
+
+When a host crashes without properly handing off its directory partitions, the subsequent partition owners must perform recovery. This involves:
+
+1. Querying all active silos in the cluster for their grain registrations.
+1. Rebuilding the directory state for affected ranges.
+1. Ensuring no duplicate grain activations occur.
+
+Recovery is also necessary when cluster membership changes happen rapidly. While cluster membership guarantees monotonicity, it's possible for silos to miss intermediate membership views. In such cases:
+
+- Snapshot transfers are abandoned.
+- Recovery is performed instead of normal partition-to-partition handover.
+- The system maintains consistency despite missing intermediate states.
+
+A future improvement to cluster membership may reduce or eliminate these scenarios by ensuring all views are seen by all silos.
diff --git a/docs/orleans/toc.yml b/docs/orleans/toc.yml
index bd72065ba0298..50f31026af6b6 100644
--- a/docs/orleans/toc.yml
+++ b/docs/orleans/toc.yml
@@ -221,6 +221,8 @@ items:
items:
- name: Overview
href: implementation/index.md
+ - name: Grain directory
+ href: implementation/grain-directory.md
- name: Orleans lifecycle
href: implementation/orleans-lifecycle.md
- name: Messaging delivery guarantees