PoC: Race condition prevention in worker#352
Merged
Conversation
… it runs one at a time across all worker instances Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Contributor
Reviewer's GuideAdds a database-backed distributed locking mechanism for the Sui indexer cron job and wires it into the scheduled handler, including schema changes, storage APIs, and tests to prevent overlapping runs. Sequence diagram for scheduled Sui indexer cron with distributed locksequenceDiagram
actor Scheduler
participant Worker
participant D1Storage
participant D1Database as D1_DB
participant SuiIndexer
participant RedeemSolver
participant Logger
Scheduler->>Worker: scheduled(event, env, ctx)
Worker->>D1Storage: new D1Storage(env.DB)
Worker->>D1Storage: acquireLock(sui-indexer-cron, ttlMs=5min)
D1Storage->>D1Database: INSERT INTO cron_locks ... ON CONFLICT ...
D1Database-->>D1Storage: acquired_at or null
D1Storage-->>Worker: lockToken
alt lock not acquired
Worker->>Logger: warn(Cron job already running, skipping)
Worker-->>Scheduler: return
else lock acquired
Worker->>D1Storage: getActiveNetworks()
Worker->>Worker: getSecret(NBTC_MINTING_SIGNER_MNEMONIC)
Worker->>Worker: createSuiClients(activeNetworks, mnemonic)
par parallel tasks
Worker->>SuiIndexer: runSuiIndexer(storage, activeNetworks, suiClients)
Worker->>RedeemSolver: runRedeemSolver(storage, env, suiClients, activeNetworks)
end
Worker->>Worker: reportErrors(results, scheduled, Scheduled task error, [SuiIndexer, RedeemSolver])
Worker->>D1Storage: releaseLock(sui-indexer-cron, lockToken)
D1Storage->>D1Database: DELETE FROM cron_locks WHERE lock_name = ... AND acquired_at = ...
D1Database-->>D1Storage: ok
D1Storage-->>Worker: void
Worker-->>Scheduler: return
end
ER diagram for new cron_locks table for distributed lockserDiagram
cron_locks {
TEXT lock_name PK
INTEGER acquired_at
INTEGER expires_at
}
Class diagram for D1Storage with distributed lock methodsclassDiagram
class D1Storage {
- any db
+ acquireLock(lockName string, ttlMs number) Promise~number|null~
+ releaseLock(lockName string, acquiredAt number) Promise~void~
}
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a DB-backed distributed lock to prevent overlapping scheduled executions of the Sui indexer worker (and its paired redeem-solver task) by persisting named cron locks in D1.
Changes:
- Add
D1Storage.acquireLock/releaseLockmethods backed by a newcron_lockstable. - Guard
scheduled()execution inpackages/sui-indexerbehind the DB-backed lock and ensure unlock infinally. - Add unit tests for lock acquisition/expiry/release behaviors and update test DB initialization to include the new table.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/sui-indexer/src/storage.ts | Adds D1-backed lock acquire/release helpers. |
| packages/sui-indexer/src/storage.test.ts | Adds unit tests for the new locking behavior. |
| packages/sui-indexer/src/index.ts | Wraps scheduled cron work in the distributed lock. |
| packages/lib/src/test-helpers/init_db.ts | Ensures test DB teardown includes cron_locks. |
| packages/btcindexer/db/migrations/0001_initial_schema.sql | Adds cron_locks table to the schema. |
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Contributor
robert-zaremba
left a comment
There was a problem hiding this comment.
I'm not sure this is solving the issue
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Contributor
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
acquireLock, any database error is treated the same as normal lock contention (returnsnull); consider distinguishing these cases (e.g., by rethrowing or returning a separate error state) so scheduler behavior and alerting can differentiate real failures from expected lock contention. - The
"should not release another instance's lock after expiry"test relies on real timing withsetTimeout(20), which can be flaky in CI; consider mockingDate.now()or using a controllable clock instead of wall time to make the timing-based lock expiry behavior deterministic.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `acquireLock`, any database error is treated the same as normal lock contention (returns `null`); consider distinguishing these cases (e.g., by rethrowing or returning a separate error state) so scheduler behavior and alerting can differentiate real failures from expected lock contention.
- The `"should not release another instance's lock after expiry"` test relies on real timing with `setTimeout(20)`, which can be flaky in CI; consider mocking `Date.now()` or using a controllable clock instead of wall time to make the timing-based lock expiry behavior deterministic.
## Individual Comments
### Comment 1
<location> `packages/sui-indexer/src/storage.ts:858-877` </location>
<code_context>
};
}
+
+ async acquireLock(lockName: string, ttlMs: number): Promise<number | null> {
+ const now = Date.now();
+ try {
+ const result = await this.db
+ .prepare(
+ `INSERT INTO cron_locks (lock_name, acquired_at, expires_at)
+ VALUES (?, ?, ?)
+ ON CONFLICT(lock_name) DO UPDATE
+ SET acquired_at = excluded.acquired_at, expires_at = excluded.expires_at
+ WHERE cron_locks.expires_at <= excluded.acquired_at
+ RETURNING acquired_at`,
+ )
+ .bind(lockName, now, now + ttlMs)
+ .first<number>("acquired_at");
+ return result ?? null;
+ } catch (error) {
+ logError({ msg: "Failed to acquire lock", method: "acquireLock", lockName }, error);
+ return null;
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider distinguishing DB errors from an actually-contended lock instead of returning null in both cases.
Since `scheduled` treats `null` as “job already running,” DB failures are indistinguishable from normal lock contention and may silently mask storage issues. Consider propagating DB errors (or using a distinct error/sentinel type) so callers can handle contention vs. operational failure differently.
```suggestion
async acquireLock(lockName: string, ttlMs: number): Promise<number | null> {
const now = Date.now();
try {
const result = await this.db
.prepare(
`INSERT INTO cron_locks (lock_name, acquired_at, expires_at)
VALUES (?, ?, ?)
ON CONFLICT(lock_name) DO UPDATE
SET acquired_at = excluded.acquired_at, expires_at = excluded.expires_at
WHERE cron_locks.expires_at <= excluded.acquired_at
RETURNING acquired_at`,
)
.bind(lockName, now, now + ttlMs)
.first<number>("acquired_at");
// returns `null` when the lock is currently held (no row inserted/updated),
// and throws on DB/operational errors (distinguishable by callers)
return result ?? null;
} catch (error) {
logError({ msg: "Failed to acquire lock", method: "acquireLock", lockName }, error);
throw error;
}
}
```
</issue_to_address>
### Comment 2
<location> `packages/sui-indexer/src/storage.ts:879-884` </location>
<code_context>
+ }
+ }
+
+ async releaseLock(lockName: string, acquiredAt: number): Promise<void> {
+ await this.db
+ .prepare(`DELETE FROM cron_locks WHERE lock_name = ? AND acquired_at = ?`)
+ .bind(lockName, acquiredAt)
+ .run();
+ }
}
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider adding error handling/logging to `releaseLock` similar to `acquireLock`.
If this delete fails (e.g. due to a transient DB issue), the lock can be left behind and block future runs until expiry, with no visibility into why. Adding error logging here, similar to `acquireLock` (even if the error is rethrown), would make diagnosing stuck cron locks much easier.
```suggestion
async releaseLock(lockName: string, acquiredAt: number): Promise<void> {
try {
await this.db
.prepare(`DELETE FROM cron_locks WHERE lock_name = ? AND acquired_at = ?`)
.bind(lockName, acquiredAt)
.run();
} catch (error) {
logError({ msg: "Failed to release lock", method: "releaseLock", lockName, acquiredAt }, error);
throw error;
}
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
robert-zaremba
approved these changes
Feb 19, 2026
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
Signed-off-by: Rayane Charif <rayane.charif@gonative.cc>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: #309
Summary by Sourcery
Introduce a distributed locking mechanism to prevent concurrent executions of the Sui indexer scheduled job.
New Features:
Enhancements:
Tests:
Summary by Sourcery
Introduce a database-backed distributed lock to prevent overlapping executions of the Sui indexer scheduled job.
New Features:
Enhancements:
Build:
Documentation:
Tests: