Skip to content

Comments

Fix Shard remember-entities flag mismatch causing entity restart failures#8054

Merged
Aaronontheweb merged 2 commits intoakkadotnet:devfrom
Aaronontheweb:claude-wt-racy_shard_spec
Feb 25, 2026
Merged

Fix Shard remember-entities flag mismatch causing entity restart failures#8054
Aaronontheweb merged 2 commits intoakkadotnet:devfrom
Aaronontheweb:claude-wt-racy_shard_spec

Conversation

@Aaronontheweb
Copy link
Member

Summary

  • Fixed a bug in Shard.cs where the Entities class was initialized with settings.RememberEntities (from HOCON config, default false) instead of _rememberEntities (derived from whether a rememberEntitiesProvider was passed to the constructor)
  • This mismatch caused the Entities._remembering HashSet to never be populated, so OnUpdateDone would see no pending work and incorrectly transition the shard to Idle while a store write was in-flight
  • The dropped UpdateDone prevented entity restarts after transient failures (constructor/PreStart exceptions), causing the ShardEntityFailureSpec test to flake

Root Cause

The Shard constructor had two independent "remember entities enabled" flags:

  1. _rememberEntities = rememberEntitiesProvider != null (true when provider passed)
  2. Entities.RememberingEntities = settings.RememberEntities (from HOCON config)

When a rememberEntitiesProvider was supplied without the config flag being set, PassivateCompleted would trigger a store write and Context.Become(WaitingForRememberEntitiesStore), but then OnUpdateDone's pending check against the empty _remembering set would overwrite this with Context.Become(Idle), causing subsequent UpdateDone messages to be dropped with "Id must not be empty".

Test plan

  • ShardEntityFailureSpec passes (both ConstructorFailActor and PreStartFailActor variants)
  • All 24 shard-related tests pass
  • Validated with 200 consecutive passes by stress testing

…ures

The Shard constructor derived _rememberEntities from whether a provider
was passed, but passed settings.RememberEntities (from HOCON config) to
the Entities class. When a provider was supplied without the config flag,
the Entities._remembering set was never populated. This caused
OnUpdateDone to see no pending work and overwrite the
WaitingForRememberEntitiesStore behavior with Idle, dropping the
subsequent UpdateDone from the store and preventing entity restarts.
@Aaronontheweb Aaronontheweb enabled auto-merge (squash) February 25, 2026 03:30
@Aaronontheweb Aaronontheweb merged commit d332236 into akkadotnet:dev Feb 25, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant