fix: add global deadline and mitigate PoC validation timeout attack#827
fix: add global deadline and mitigate PoC validation timeout attack#827ouicate wants to merge 2 commits intogonka-ai:upgrade-v0.2.11from
Conversation
Malicious participants could force the off-chain PoC validation pipeline
to exceed its on-chain time budget by delaying HTTP proof requests.
Workers would continue retrying indefinitely past the on-chain submission
window, producing validation results that could never be submitted.
Root causes addressed:
1. No global deadline linked to on-chain window:
- Added computeValidationDeadline() that calculates remaining time from
the epoch's EndOfPoCValidation block height (~5.41s/block)
- ValidateAll now uses context.WithTimeout instead of context.WithCancel
- 60s safety buffer (configurable via DeadlineBuffer) ensures time for
final submission before window closes
- Deadline-aware context propagates to all HTTP and ML node calls,
cancelling in-flight requests when the window expires
2. Worker context not propagated to HTTP calls:
- validateParticipant now receives the deadline-aware ctx parameter
instead of creating its own context.Background()
- HTTP proof fetches and ML node requests respect the global deadline
3. Busy-wait spin on retry-after:
- Workers now sleep 100ms when encountering not-yet-ready items
- Added ctx.Done() check during retry re-queue to prevent deadlock
on context cancellation
4. Retry-exhausted participants not reported:
- Enabled reporting of participants that exhaust all 15 retries as
invalid to the chain, preventing attackers from suffering no penalty
| case workChan <- work: | ||
| case <-ctx.Done(): | ||
| return | ||
| } |
There was a problem hiding this comment.
It's not bad to cancel here before we sleep.
But it was reported as preventing deadlock on context cancelation. How deadlock could happen here? Isn't after continue it will go to for and check ctx.Done() there?
Why workers can keep retrying indefinitely? Retries at validation workers are limited to MaxRetries Moreover there could be situations when block is not finished in Maybe it is more precise to cancel ctx, when phase is switched, and do not add // Stop workers when the chain moves from PoCValidatePhase to the next phase (e.g. PoCValidateWindDown / Inference)
phaseCheckInterval := v.config.PhaseCheckInterval
if phaseCheckInterval <= 0 {
phaseCheckInterval = 3 * time.Second
}
go func() {
ticker := time.NewTicker(phaseCheckInterval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
state := v.phaseTracker.GetCurrentEpochState()
if state == nil {
continue
}
if state.CurrentPhase != types.PoCValidatePhase {
logging.Info("OffChainValidator: validation phase ended, stopping workers", types.PoC,
"currentPhase", state.CurrentPhase, "blockHeight", state.CurrentBlock.Height)
cancel()
return
}
}
}
}() |
Malicious participants could force the off-chain PoC validation pipeline to exceed its on-chain time budget by delaying HTTP proof requests. Workers would continue retrying indefinitely past the on-chain submission window, producing validation results that could never be submitted.
Root causes addressed:
No global deadline linked to on-chain window:
Worker context not propagated to HTTP calls:
Busy-wait spin on retry-after:
Retry-exhausted participants not reported: