-
Notifications
You must be signed in to change notification settings - Fork 636
Description
Describe the bug
When running garnet, the Automatic Slot Migration fails.
Cluster: A two-node (2 shards, no replicas) cluster. The cluster is initialized with all slots being on one node with 5-10 thousand keys, and then we try to migrate some hash slots from one primary/shard to another while continuosly writing data and deleting keys (both manually and using TTLs). There are some short TTL keys that are being deleted and modified consistently.
Garnet Version: 1.0.94
Network: IPv6 with TLS
Steps to reproduce the bug
Command:
/usr/bin/valkey-cli -h primary-01.foo.bar -p 6379 --user myuser -a mypassword --tls --cacert /path/to/ca.crt MIGRATE ipv6-address-of-primary-02 6379 "" 0 -1 REPLACE AUTH2 myuser 'mypassword' SLOTSRANGE 8192 16383When the above command is executed, the sender says that's migrating slots to the target but, it throws an error in the logs and the cluster_slots_ok and cluster_slots_assigned are both decreased (by the number of slots being migrated).
When you look at CLUSTER MTASKS, it says that it's 0. It should say 1 during the migration.
Sender's Logs:
2026-02-13T06:24:04.987917+00:00 primary-02 GarnetServer[216135]: 06::24::04 fail: MigrateSession - 14333193[0] CreateAndRunMigrateTasks: Object 24 240 4096 System.NullReferenceException: Object reference not set to an instance of an object. at Garnet.cluster.ClusterSession.Expired(IGarnetObject& value) in /_/libs/cluster/Session/MigrateCommand.cs:line 18 at Garnet.cluster.MigrateSession.ObjectStoreScan.SingleReader(Byte[]& key, IGarnetObject& value, RecordMetadata recordMetadata, Int64 numberOfRecords, CursorRecordResult& cursorRecordResult) in /_/libs/cluster/Server/Migration/MigrateScanFunctions.cs:line 78 at Tsavorite.core.AllocatorBase`4.ScanLookup[TInput,TOutput,TScanFunctions,TScanIterator](TsavoriteKV`4 store, ScanCursorState`2 scanCursorState, Int64& cursor, Int64 count, TScanFunctions scanFunctions, TScanIterator iter, Boolean validateCursor, Int64 maxAddress, Boolean resetCursor, Boolean includeTombstones) in /_/libs/storage/Tsavorite/cs/src/core/Allocator/AllocatorScan.cs:line 197 at Tsavorite.core.GenericAllocatorImpl`3.ScanCursor[TScanFunctions](TsavoriteKV`4 store, ScanCursorState`2 scanCursorState, Int64& cursor, Int64 count, TScanFunctions scanFunctions, Int64 endAddress, Boolean validateCursor, Int64 maxAddress, Boolean resetCursor, Boolean includeTombstones) in /_/libs/storage/Tsavorite/cs/src/core/Allocator/GenericAllocatorImpl.cs:line 1034 at Tsavorite.core.ClientSession`8.ScanCursor[TScanFunctions](Int64& cursor, Int64 count, TScanFunctions scanFunctions, Int64 endAddress, Boolean validateCursor, Int64 maxAddress, Boolean resetCursor, Boolean includeTombstones) in /_/libs/storage/Tsavorite/cs/src/core/ClientSession/ClientSession.cs:line 503 at Tsavorite.core.ClientSession`8.IterateLookup[TScanFunctions](TScanFunctions& scanFunctions, Int64& cursor, Int64 untilAddress, Boolean validateCursor, Int64 maxAddress, Boolean resetCursor, Boolean includeTombstones) in /_/libs/storage/Tsavorite/cs/src/core/ClientSession/ClientSession.cs:line 477 at Garnet.server.StorageSession.IterateObjectStore[TScanFunctions](TScanFunctions& scanFunctions, Int64& cursor, Int64 untilAddress, Int64 maxAddress, Boolean validateCursor, Boolean includeTombstones) in /_/libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs:line 172 at Garnet.server.GarnetApi`2.IterateObjectStore[TScanFunctions](TScanFunctions& scanFunctions, Int64& cursor, Int64 untilAddress, Int64 maxAddress, Boolean includeTombstones) in /_/libs/server/API/GarnetApi.cs:line 463 at Garnet.cluster.MigrateSession.MigrateOperation.Scan(StoreType storeType, Int64& currentAddress, Int64 endAddress) in /_/libs/cluster/Server/Migration/MigrateOperation.cs:line 67 at Garnet.cluster.MigrateSession.<>c__DisplayClass61_0.<MigrateSlotsDriverInline>g__ScanStoreTask|1(Int32 taskId, StoreType storeType, Int64 beginAddress, Int64 tailAddress, Int32 pageSize) in /_/libs/cluster/Server/Migration/MigrateSessionSlots.cs:line 93 at Garnet.cluster.MigrateSession.<>c__DisplayClass61_2.<MigrateSlotsDriverInline>b__2() in /_/libs/cluster/Server/Migration/MigrateSessionSlots.cs:line 57 at System.Threading.Tasks.Task`1.InnerInvoke() at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location --- at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of stack trace from previous location --- at Garnet.cluster.MigrateSession.<>c__DisplayClass61_0.<<MigrateSlotsDriverInline>g__CreateAndRunMigrateTasks|0>d.MoveNext() in /_/libs/cluster/Server/Migration/MigrateSessionSlots.cs:line 63
2026-02-13T06:24:04.988258+00:00 primary-02 GarnetServer[216135]: 06::24::04 fail: MigrateSession - 14333193[0] MigrateSlotsDriver failed
2026-02-13T06:24:04.989179+00:00 primary-02 GarnetServer[216135]: 06::24::04 fail: MigrateSession - 14333193[0] An error occurred System.AggregateException: One or more errors occurred. (A task was canceled.) ---> System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Threading.Tasks.Task.GetExceptions(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification) at Garnet.cluster.MigrateSession.TrySetSlotRanges(String nodeid, MigrateState state) in /_/libs/cluster/Server/Migration/MigrateSession.cs:line 270 at Garnet.cluster.MigrateSession.TryRecoverFromFailure() in /_/libs/cluster/Server/Migration/MigrateSession.cs:line 331 at Garnet.cluster.MigrateSession.BeginAsyncMigrationTask() in /_/libs/cluster/Server/Migration/MigrationDriver.cs:line 86 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task`1.TrySetResult(TResult result) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result) at Garnet.cluster.MigrateSession.MigrateSlotsDriverInline() in /_/libs/cluster/Server/Migration/MigrateSessionSlots.cs:line 122 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task`1.TrySetResult(TResult result) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result) at Garnet.cluster.MigrateSession.<>c__DisplayClass61_0.<<MigrateSlotsDriverInline>g__CreateAndRunMigrateTasks|0>d.MoveNext() in /_/libs/cluster/Server/Migration/MigrateSessionSlots.cs:line 72 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.FinishSlow(Boolean userDelegateExecute) at System.Threading.Tasks.Task.TrySetException(Object exceptionObject) at System.Threading.Tasks.Task.RunOrQueueCompletionAction(ITaskCompletionAction completionAction, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.FinishSlow(Boolean userDelegateExecute) at System.Threading.Tasks.Task.TrySetException(Object exceptionObject) at System.Threading.Tasks.Task.WhenAllPromise.Invoke(Task completedTask) at System.Threading.Tasks.Task.RunOrQueueCompletionAction(ITaskCompletionAction completionAction, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.FinishSlow(Boolean userDelegateExecute) at System.Threading.Tasks.Task.TrySetException(Object exceptionObject) at System.Threading.Tasks.UnwrapPromise`1.TrySetFromTask(Task task, Boolean lookForOce) at System.Threading.Tasks.UnwrapPromise`1.ProcessCompletedOuterTask(Task task) at System.Threading.Tasks.UnwrapPromise`1.InvokeCore(Task completingTask) at System.Threading.Tasks.UnwrapPromise`1.Invoke(Task completingTask) at System.Threading.Tasks.Task.RunOrQueueCompletionAction(ITaskCompletionAction completionAction, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.FinishSlow(Boolean userDelegateExecute) at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() --- End of stack trace from previous location --- --- End of inner exception stack trace --- at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification) at Garnet.cluster.MigrateSession.TrySetSlotRanges(String nodeid, MigrateState state) in /_/libs/cluster/Server/Migration/MigrateSession.cs:line 270
2026-02-13T06:24:04.989350+00:00 primary-02 GarnetServer[216135]: 06::24::04 fail: MigrateSession - 14333193[0] MigrateSession.RecoverFromFailure failed to make slots STABLEClaude's Response:
What this means practically
- The migration of slot(s) from this node (session <removed>) failed mid-scan
- The slot(s) may be stuck in a MIGRATING state on this node since recovery also failed
- You likely need to manually reset the slot state (e.g., CLUSTER SETSLOT <slot> STABLE) on the affected node
- The root cause is a Garnet bug — the Expired() check in MigrateCommand.cs doesn't handle null objects during object store iteration
This is a known class of issue with Garnet's migration path for the object store. If you're hitting this consistently, it may be related to keys with TTLs expiring during migration. You could check if there's a newer Garnet version (you're on 1.0.94) that fixes the
null check in MigrateSession.Expired()
Expected behavior
Migration should complete within a few seconds.
Screenshots
No response
Release version
1.0.94
IDE
No response
OS version
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
Additional context
No response