Skip to content

Align all membership providers' CleanupDefunctSiloEntries to remove non-Active stale entries#9972

Draft
Copilot wants to merge 6 commits intomainfrom
copilot/implement-cleanup-defunct-silo
Draft

Align all membership providers' CleanupDefunctSiloEntries to remove non-Active stale entries#9972
Copilot wants to merge 6 commits intomainfrom
copilot/implement-cleanup-defunct-silo

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 25, 2026

  • Fix InMemoryMembershipTable.CleanupDefunctSiloEntries to remove all non-Active old entries (not just Dead entries)
  • Fix InProcessMembershipTable.CleanupDefunctSiloEntries to use the same != SiloStatus.Active predicate for consistency
  • Add unit tests for the development clustering CleanupDefunctSiloEntries in InMemoryMembershipTableTests
  • Fix CosmosMembershipTable.CleanupDefunctSiloEntries — push Status != Active filter server-side (indexed), then apply date check in C# using Math.Max(IAmAliveTime.Ticks, StartTime.Ticks) < beforeDate.Ticks to preserve correct semantics when IAmAliveTime is uninitialized
  • Implement ZooKeeperBasedMembershipTable.CleanupDefunctSiloEntries — was NotImplementedException; deletes non-Active entries using Math.Max(IAmAliveTime.Ticks, StartTime.Ticks) < beforeDate.Ticks date comparison (ZooKeeper has no query capability so full materialization is unavoidable)
Original prompt

This section details on the original issue you should resolve

<issue_title>Implement CleanupDefunctSiloEntries for development clustering</issue_title>
<issue_description>After about 10 mins of inactivity my code is always trying to execute the following code and throws the error

public Task CleanupDefunctSiloEntries(DateTimeOffset beforeDate)
        {
                throw new NotImplementedException();
        }

in SystemTargetBasedMembershipTable.cs

What is the reason of this?

My startup code

builder.Host.UseOrleans(silo =>
{
    silo
        .UseLocalhostClustering()
        .ConfigureLogging(options => options.SetMinimumLevel(LogLevel.Information))
        .AddIncomingGrainCallFilter<LoggingInterceptor>()
        .AddAdoNetGrainStorage(StateStorageConstants.StorageName, options =>
        {
            options.Invariant = "Npgsql";
            options.ConnectionString = builder.Configuration.GetConnectionString("postgres");
        });
});

</issue_description>

<agent_instructions>do a good job</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@ReubenBond That error is benign. The "localhost clustering" is only intended for dev/test, so generally doesn't live long-enough to warrant cleaning up defunct silos.

Regardless, we should implement it.</comment_new>
<comment_new>@
We've moved this issue to the Backlog. This means that it is not going to be worked on for the coming release. We review items in the backlog at the end of each milestone/release and depending on the team's priority we may reconsider this issue for the following milestone. </comment_new>


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Microsoft Reviewers: Open in CodeFlow

@ReubenBond
Copy link
Copy Markdown
Member

@copilot note that we have an implementation of an in-memory membership table cleanup here: src\Orleans.TestingHost\InProcess\InProcessMembershipTable.cs. You can take inspiration from that.

Copilot AI changed the title [WIP] Implement CleanupDefunctSiloEntries for development clustering Fix CleanupDefunctSiloEntries for development clustering to remove all non-Active stale entries Mar 25, 2026
Copilot AI requested a review from rkargMsft March 25, 2026 18:13
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 25, 2026

@copilot note that we have an implementation of an in-memory membership table cleanup here: src\Orleans.TestingHost\InProcess\InProcessMembershipTable.cs. You can take inspiration from that.

Fixed InProcessMembershipTable.CleanupDefunctSiloEntries in commit 3d056d3 to apply the same != SiloStatus.Active predicate, keeping it consistent with InMemoryMembershipTable and all other provider implementations.

Copilot AI requested a review from ReubenBond March 25, 2026 18:20
@rkargMsft
Copy link
Copy Markdown
Contributor

@copilot the issue #8256 appears to have already been addressed but we should do alignment of all membership providers to remove non-Active silos in CleanupDefunctSiloEntries. Update this PR to be that alignment across all providers.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 25, 2026

@copilot the issue #8256 appears to have already been addressed but we should do alignment of all membership providers to remove non-Active si...

Done in commit 493f3b1. Updated all providers to remove non-Active stale entries:

  • ZooKeeper — implemented CleanupDefunctSiloEntries (was NotImplementedException): reads all entries, deletes any with status != Active older than the cutoff
  • Cosmos — changed from ReadSilos(SiloStatus.Dead) to ReadSilos() with (SiloStatus)s.Status != SiloStatus.Active filter; also aligned the timestamp check to use Math.Max(IAmAliveTime, StartTime) like all other providers

The other providers (Redis, Cassandra, AzureStorage, Consul, DynamoDB, AdoNet) were already using the correct non-Active predicate.

Copilot AI changed the title Fix CleanupDefunctSiloEntries for development clustering to remove all non-Active stale entries Align all membership providers' CleanupDefunctSiloEntries to remove non-Active stale entries Mar 25, 2026
@rkargMsft
Copy link
Copy Markdown
Contributor

@copilot do the new Cosmos and Zookeeper implementations materialize the whole silo list and then filter down to not active silos? Is there an elegant way to do filtering on the database side without needing to make any changes to the database schema/layout/indexes/etc.?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 25, 2026

@copilot do the new Cosmos and Zookeeper implementations materialize the whole silo list and then filter down to not active silos? Is there an elegant way to do filtering on the data...

Cosmos (commit e03465f): Yes, the previous version called ReadSilos() with no args (fetches all silos server-side), then filtered in memory. It's now updated to build the LINQ query with all predicates pushed to the server — Status != Active uses the existing index, and IAmAliveTime < beforeDate && StartTime < beforeDate (equivalent to Math.Max(IAmAliveTime, StartTime) < beforeDate) are also server-side predicates. StartTime/IAmAliveTime are excluded from the Cosmos index, but the Status filter reduces the result set without any schema changes.

ZooKeeper: ZooKeeper is a hierarchical KV store with no query capability — you list child node names, then fetch each node's data individually. There's no way to push predicates to the server. Full materialization is inherent to its architecture, so the current implementation is already optimal.

@rkargMsft
Copy link
Copy Markdown
Contributor

@copilot I don't think that strictly IAmAliveTime < beforeDate && StartTime < beforeDate in Cosmos is equivalent to Math.Max(IAmAliveTime, StartTime) < beforeDate) in the C# code. If IAmAliveTime doesn't have a value assigned then it will not get returned by the query, but functionally what we want is to look for an older IAmAliveTime older than beforeDate but if that's not defined (C# would map that to DateTime.MinValue) then check that StartTime is older than beforeDate.

If the above is correct then let's go with doing just the Status filtering in Cosmos (since that's indexed already) and then keep the date filtering all in C# to preserve the desired behavior.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 25, 2026

@copilot I don't think that strictly IAmAliveTime < beforeDate && StartTime < beforeDate in Cosmos is equivalent to Math.Max(IAmAliveTime, StartTime) < beforeDate) in the C# code...

Fixed in commit 2d0523f. The Cosmos query now only pushes Status != Active to the server (which uses the existing index), then applies the Math.Max(IAmAliveTime, StartTime) < beforeDate date check in C# to preserve the correct behavior when IAmAliveTime is uninitialized.

foreach (var (entry, _) in rows)
{
if (entry.Status != SiloStatus.Active
&& new DateTime(Math.Max(entry.IAmAliveTime.Ticks, entry.StartTime.Ticks), DateTimeKind.Utc) < beforeDate)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot instead of creating a new datetime here, compare the max ticks to the beforeDate.Ticks.

Make the same update to the Cosmos implementation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in commit 0d6d77c. Both ZooKeeper and Cosmos now use Math.Max(...Ticks) < beforeDate.Ticks directly.

Copilot AI and others added 2 commits March 25, 2026 14:19
Fix InMemoryMembershipTable.CleanupDefunctSiloEntries to remove all
non-Active old entries instead of only Dead entries, consistent with
other provider implementations (Cassandra, Azure, etc.).

Add unit tests for the fixed behavior in InMemoryMembershipTableTests.

Co-authored-by: rkargMsft <164392675+rkargMsft@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/orleans/sessions/28c01159-31b4-4141-841b-f5f1aec83732
Apply the same non-Active predicate to InProcessMembershipTable
used by the in-process test cluster, for consistency.

Co-authored-by: ReubenBond <203839+ReubenBond@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/orleans/sessions/8e740b40-c62d-4e85-8aac-9c8a56f39bc7
Copilot AI and others added 4 commits March 25, 2026 14:19
…ve stale entries

- CosmosMembershipTable: remove all non-Active stale entries (was only Dead),
  using proper Math.Max(IAmAliveTime, StartTime) timestamp check
- ZooKeeperBasedMembershipTable: implement CleanupDefunctSiloEntries (was
  NotImplementedException) to delete non-Active stale entries

Co-authored-by: rkargMsft <164392675+rkargMsft@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/orleans/sessions/cfa9474b-c8f1-4707-b13e-8ac1054679e7
… query

Instead of fetching all silos and filtering in memory, build a server-side LINQ
query that filters by status != Active and both date fields. The Status field is
indexed; IAmAliveTime/StartTime are not indexed but the status filter reduces the
result set without schema changes.

Co-authored-by: rkargMsft <164392675+rkargMsft@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/orleans/sessions/55b130f3-7c14-4eac-bc1d-a808e43da0cb
…, keep date check in C#

The Status field is indexed in Cosmos so filtering non-Active entries server-side
is efficient. The date check uses Math.Max(IAmAliveTime, StartTime) semantics in
C# to correctly handle cases where IAmAliveTime may be uninitialized (DateTime.MinValue).

Co-authored-by: rkargMsft <164392675+rkargMsft@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/orleans/sessions/06efe6ea-ad61-43fd-ba3e-9e4f08a8752f
…ing new DateTime

Co-authored-by: rkargMsft <164392675+rkargMsft@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/orleans/sessions/0d84e85c-6909-4b6e-8d16-0f2e6782c645
@rkargMsft rkargMsft force-pushed the copilot/implement-cleanup-defunct-silo branch from 0d6d77c to cb62440 Compare March 25, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement CleanupDefunctSiloEntries for development clustering

3 participants