fix: add global deadline and mitigate PoC validation timeout attack by ouicate · Pull Request #827 · gonka-ai/gonka

ouicate · 2026-02-28T14:50:24Z

Malicious participants could force the off-chain PoC validation pipeline to exceed its on-chain time budget by delaying HTTP proof requests. Workers would continue retrying indefinitely past the on-chain submission window, producing validation results that could never be submitted.

Root causes addressed:

No global deadline linked to on-chain window:
- Added computeValidationDeadline() that calculates remaining time from the epoch's EndOfPoCValidation block height (~5.41s/block)
- ValidateAll now uses context.WithTimeout instead of context.WithCancel
- 60s safety buffer (configurable via DeadlineBuffer) ensures time for final submission before window closes
- Deadline-aware context propagates to all HTTP and ML node calls, cancelling in-flight requests when the window expires
Worker context not propagated to HTTP calls:
- validateParticipant now receives the deadline-aware ctx parameter instead of creating its own context.Background()
- HTTP proof fetches and ML node requests respect the global deadline
Busy-wait spin on retry-after:
- Workers now sleep 100ms when encountering not-yet-ready items
- Added ctx.Done() check during retry re-queue to prevent deadlock on context cancellation
Retry-exhausted participants not reported:
- Enabled reporting of participants that exhaust all 15 retries as invalid to the chain, preventing attackers from suffering no penalty

Malicious participants could force the off-chain PoC validation pipeline to exceed its on-chain time budget by delaying HTTP proof requests. Workers would continue retrying indefinitely past the on-chain submission window, producing validation results that could never be submitted. Root causes addressed: 1. No global deadline linked to on-chain window: - Added computeValidationDeadline() that calculates remaining time from the epoch's EndOfPoCValidation block height (~5.41s/block) - ValidateAll now uses context.WithTimeout instead of context.WithCancel - 60s safety buffer (configurable via DeadlineBuffer) ensures time for final submission before window closes - Deadline-aware context propagates to all HTTP and ML node calls, cancelling in-flight requests when the window expires 2. Worker context not propagated to HTTP calls: - validateParticipant now receives the deadline-aware ctx parameter instead of creating its own context.Background() - HTTP proof fetches and ML node requests respect the global deadline 3. Busy-wait spin on retry-after: - Workers now sleep 100ms when encountering not-yet-ready items - Added ctx.Done() check during retry re-queue to prevent deadlock on context cancellation 4. Retry-exhausted participants not reported: - Enabled reporting of participants that exhaust all 15 retries as invalid to the chain, preventing attackers from suffering no penalty

akup · 2026-03-05T03:12:32Z

decentralized-api/poc/validator.go

+				case workChan <- work:
+				case <-ctx.Done():
+					return
+				}


It's not bad to cancel here before we sleep.
But it was reported as preventing deadlock on context cancelation. How deadlock could happen here? Isn't after continue it will go to for and check ctx.Done() there?

akup · 2026-03-05T03:12:48Z

Workers would continue retrying indefinitely past the on-chain submission window

Why workers can keep retrying indefinitely?

Retries at validation workers are limited to MaxRetries
if work.attempt < v.config.MaxRetries-1

Moreover there could be situations when block is not finished in 5.41 and even adding 60 seconds doesn't give guarantees that works will not be cancelled before validation window closes.

Maybe it is more precise to cancel ctx, when phase is switched, and do not add context.withTimeout:

// Stop workers when the chain moves from PoCValidatePhase to the next phase (e.g. PoCValidateWindDown / Inference)
	phaseCheckInterval := v.config.PhaseCheckInterval
	if phaseCheckInterval <= 0 {
		phaseCheckInterval = 3 * time.Second
	}
	go func() {
		ticker := time.NewTicker(phaseCheckInterval)
		defer ticker.Stop()
		for {
			select {
			case <-ctx.Done():
				return
			case <-ticker.C:
				state := v.phaseTracker.GetCurrentEpochState()
				if state == nil {
					continue
				}
				if state.CurrentPhase != types.PoCValidatePhase {
					logging.Info("OffChainValidator: validation phase ended, stopping workers", types.PoC,
						"currentPhase", state.CurrentPhase, "blockHeight", state.CurrentBlock.Height)
					cancel()
					return
				}
			}
		}
	}()

ouicate changed the base branch from main to upgrade-v0.2.11 February 28, 2026 14:50

Merge branch 'upgrade-v0.2.11' into fix/poc-validation-timeout

26a8428

akup reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add global deadline and mitigate PoC validation timeout attack#827

fix: add global deadline and mitigate PoC validation timeout attack#827
ouicate wants to merge 2 commits intogonka-ai:upgrade-v0.2.11from
ouicate:fix/poc-validation-timeout

ouicate commented Feb 28, 2026

Uh oh!

akup Mar 5, 2026

Uh oh!

akup commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ouicate commented Feb 28, 2026

Uh oh!

akup Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

akup commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants