Fix grains not recycling after cancelling DelayDeactivation#9945
Fix grains not recycling after cancelling DelayDeactivation#9945
Conversation
|
@copilot continue working on this |
|
@copilot continue working on this |
Co-authored-by: ReubenBond <203839+ReubenBond@users.noreply.github.com>
Root cause: Timeout.InfiniteTimeSpan equals TimeSpan.FromMilliseconds(-1), so calling DelayDeactivation(TimeSpan.FromMilliseconds(-1)) was matched by the Timeout.InfiniteTimeSpan check and set KeepAliveUntil to DateTime.MaxValue instead of cancelling the keep-alive. Fix: Move the negative timespan check before the TimeSpan.MaxValue check and remove the Timeout.InfiniteTimeSpan comparison. When cancelling an active keep-alive, reschedule the collection ticket to CollectionAgeLimit so the grain can be collected normally. Also adds ActivationCollector to InternalGrainRuntime for access from ActivationData, test grain interface/implementation, and integration test. Co-authored-by: ReubenBond <203839+ReubenBond@users.noreply.github.com>
Found the root cause and fixed it in commit The bug was that Fix: reordered the checks so negative/zero timespans are handled first, removed the |
There was a problem hiding this comment.
Pull request overview
This pull request fixes a critical bug where calling DelayDeactivation(TimeSpan.FromMilliseconds(-1)) to cancel a previous keep-alive would incorrectly set an infinite keep-alive instead, preventing grains from ever being deactivated. The root cause was that Timeout.InfiniteTimeSpan (which equals TimeSpan.FromMilliseconds(-1)) was checked before the timespan <= TimeSpan.Zero condition, causing negative timespans to be misinterpreted.
Changes:
- Fixed the condition ordering in
ActivationData.DelayDeactivationto check negative timespans first, and added rescheduling logic to ensure grains are collected after canceling a keep-alive - Added
ActivationCollectortoInternalGrainRuntimeto enable the reschedule call fromActivationData - Added test grain interface
IKeepAliveActivationGcTestGrainand implementationKeepAliveActivationGcTestGrainto test keep-alive cancellation - Added integration test
ActivationCollectorShouldCollectAfterCancellingKeepAliveto validate the fix
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/Orleans.Runtime/Catalog/ActivationData.cs | Fixed DelayDeactivation to check negative timespans first and reschedule collection when canceling an active keep-alive |
| src/Orleans.Runtime/Core/InternalGrainRuntime.cs | Added ActivationCollector as a constructor parameter and property to enable rescheduling |
| test/Grains/TestInternalGrainInterfaces/ActivationGCTestGrainInterfaces.cs | Added IKeepAliveActivationGcTestGrain interface for testing keep-alive cancellation |
| test/Grains/TestInternalGrains/ActivationGCTestGrains.cs | Added KeepAliveActivationGcTestGrain implementation |
| test/Orleans.Runtime.Internal.Tests/ActivationsLifeCycleTests/ActivationCollectorTests.cs | Added integration test to verify grains are collected after canceling keep-alive |
|
@copilot In this case, I believe it's the docs that should be updated, since -1ms is interpreted as infinite in many places throughout .NET If DelayDeactivation(Timeout.InfiniteTimeSpan) caused the runtime to not delay deactivation indefinitely, that would be surprising to most readers. I propose we update the docs to remove the verbiage around negative timespans. So, there would be 3 typical cases: DelayDeactivation(TimeSpan.FromMinutes(5)) - delay for 5 minutes |
Per review feedback, Timeout.InfiniteTimeSpan (-1ms) should delay deactivation indefinitely, matching .NET convention. TimeSpan.Zero is the correct way to cancel a previous delay. - Restore original condition ordering in DelayDeactivation (check InfiniteTimeSpan before <= Zero) - Keep rescheduling fix when cancelling an active keep-alive - Update Grain.cs docs to describe the 3 typical cases: positive duration, Zero (cancel), InfiniteTimeSpan (indefinite) - Update test grain to use TimeSpan.Zero for cancellation Co-authored-by: ReubenBond <203839+ReubenBond@users.noreply.github.com>
Co-authored-by: ReubenBond <203839+ReubenBond@users.noreply.github.com>
Updated in |
| await grain.SetKeepAlive(TimeSpan.FromMinutes(5)); | ||
|
|
||
| // Verify the grain is not collected while it has an active keep-alive. | ||
| await Task.Delay(WAIT_TIME); |
There was a problem hiding this comment.
Fixed in a7019ff. Replaced the timing-based Task.Delay(WAIT_TIME) with ForceActivationCollection(DEFAULT_IDLE_TIMEOUT) after waiting DEFAULT_IDLE_TIMEOUT + DEFAULT_COLLECTION_QUANTUM for the grain to become idle. This deterministically verifies the grain is collectable instead of relying on the periodic quantum timer.
…ased wait Use ForceActivationCollection to deterministically verify the grain is collectable after cancelling a keep-alive, instead of relying on Task.Delay(WAIT_TIME) and the periodic quantum timer. This avoids timing-dependent flakiness on slow CI machines. Co-authored-by: ReubenBond <203839+ReubenBond@users.noreply.github.com>
|
@copilot test |
ActivationData.DelayDeactivation: restoreTimeout.InfiniteTimeSpanas infinite keep-alive, keep rescheduling fix when cancellingGrain.DelayDeactivationdocs to describe the 3 typical casesActivationCollectortoInternalGrainRuntimeActivationCollectorShouldCollectAfterCancellingKeepAliveby usingForceActivationCollectioninstead of timing-basedTask.Delayfor collection verificationOriginal prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.
Microsoft Reviewers: Open in CodeFlow