-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Severity: Informational | Likelihood: Low | Impact: Informational | Type: Vulnerability
Details
The replication publish worker does not honor context cancellation in a specific window: if cancellation occurs after the insert step but before deleting the staged row, publishStagedEnvelope returns false and the inner retry loop continues without re-checking ctx.Done(), leading to transient infinite retries. This does not block shutdown, preserves data integrity, and only causes minor operational/UX friction.
In pkg/api/message/publish_worker.go, start() processes each staged envelope with an inner loop: for !p.publishStagedEnvelope(stagedEnv) { time.Sleep(...) }. This loop does not select on ctx.Done(). Inside publishStagedEnvelope(), if p.ctx.Err() is non-nil immediately after the insert-and-increment step (and before attempting to delete the staged row), the function returns false. As a result, when shutdown triggers and the context is canceled during this window, the inner loop keeps retrying and never re-enters the outer select that checks ctx.Done(). The shutdown sequence in pkg/server/server.go does not wait for this worker goroutine, so shutdown is not blocked. Database insert semantics are idempotent (duplicate inserts return inserted == 0), and any staged row left over is cleaned up on the next attempt or restart. Client-side waits after staging are bounded by a 30-second timeout (and also cancel with request context). Overall impact is minor: transient retry looping and possible bounded client delay near shutdown, with no data corruption or lasting blockage.
Exploitation
Scenario 1
Operator initiates shutdown while a staged envelope is in-flight: the DB is closed before the shared context is canceled; publishStagedEnvelope sees cancellation or DB errors pre-delete and returns false; the inner loop retries every ~10ms until the process exits, causing transient log noise but not blocking shutdown.
Preconditions / Assumptions:
- (a) API enabled; replication publish worker running
- (b) A staged envelope is being processed when shutdown starts
- (c) BaseServer.Shutdown closes the DB before canceling the shared context
- (d) publishStagedEnvelope is between insert and delete when cancellation happens
Scenario 2
A client calls PublishPayerEnvelopes just before shutdown: the envelope is staged and the worker gets stuck pre-delete, so lastProcessed does not advance; waitForGatewayPublish waits up to 30 seconds (or until the request context is canceled), resulting in bounded extra latency for the client.
Preconditions / Assumptions:
- (a) Client request to PublishPayerEnvelopes in-flight near shutdown
- (b) Staged envelope successfully inserted for processing
- (c) Worker cancellation occurs pre-delete so lastProcessed is not updated
- (d) waitForGatewayPublish uses a 30-second timeout and respects request context cancellation
Scenario 3
An envelope is inserted into gateway tables but the staged row delete does not occur due to cancellation: after restart, the staged row is reprocessed; duplicate insert is ignored (inserted == 0) and the staged row is deleted safely without double-accounting.
Preconditions / Assumptions:
- (a) Envelope insertion succeeded but staged-row deletion did not run due to cancellation
- (b) Node is restarted and the publish worker resumes
- (c) Insert path is idempotent (duplicate insert returns inserted == 0) enabling safe deletion of the staged row
Files impacted
pkg/api/message/publish_worker.go
Lines 115-120:
// Infinite retry on failure to publish; we cannot
// continue to the next envelope until this one is processed
time.Sleep(p.sleepOnFailureTime)
}
p.lastProcessed.Store(stagedEnv.ID) metrics.EmitApiStagedEnvelopeProcessingDelay(time.Since(stagedEnv.OriginatorTime))Metadata
Metadata
Assignees
Labels
Type
Projects
Status