Skip to content

Bug: DKG permanent failure — dealer consensus uses unweighted participant votes but quorum uses slot weights #849

@Mayveskii

Description

@Mayveskii

Location

inference-chain/x/bls/keeper/phase_transitions.go — lines 74 and 295–318

Description

The DKG pipeline uses two independent threshold checks with different weighting schemes that are fundamentally inconsistent:

// 1. Transition to VERIFYING — slot-weighted quorum (correct)
if slotsWithDealerParts > epochBLSData.ITotalSlots/2 { ... }

// 2. Dealer consensus — unweighted (count of participants, NOT slots)
dealerIsValid := totalVotes > 0 && validVotes > totalVotes/2

Concrete failure scenario

Suppose:

  • Participant A holds 60% of total slots
  • Participants B, C, D each hold ~13% of slots
  1. A submits dealer parts → slotsWithDealerParts > ITotalSlots/2 → DKG transitions to VERIFYING
  2. B, C, D vote A's dealer invalid (3 vs 1 participant votes → majority by count)
  3. A votes B, C, D invalid (1 vs 3 — minority by count)
  4. DetermineValidDealersWithConsensus marks all dealers invalid
  5. ComputeGroupPublicKey returns "no valid dealers found"
  6. CompleteDKG returns an error → DKG permanently stuck for this epoch

B, C, D together hold only ~40% of slots and could never form a DKG quorum alone — yet they can destroy the epoch's DKG by voting as a participant-count majority.

Root cause

DetermineValidDealersWithConsensus counts one vote per participant regardless of slot weight:

for _, verification := range epochBLSData.VerificationSubmissions {
    if verification != nil && len(verification.DealerValidity) > 0 {
        totalVotes++
        if verification.DealerValidity[dealerIndex] {
            validVotes++
        }
    }
}
dealerIsValid := totalVotes > 0 && validVotes > totalVotes/2

While everywhere else in the DKG pipeline thresholds are measured in slots, not participant count.

Impact

High (DoS) — a minority coalition of participants (by slot weight) can permanently break DKG for an epoch by voting down dealers that collectively hold a slot majority. This blocks threshold signing for the entire epoch.

Fix Direction

Replace the unweighted participant vote count with a slot-weighted vote in DetermineValidDealersWithConsensus, consistent with how quorum is measured everywhere else:

validSlots := uint32(0)
totalSlots := uint32(0)

for i, verification := range epochBLSData.VerificationSubmissions {
    if verification != nil && len(verification.DealerValidity) > 0 {
        participant := epochBLSData.Participants[i]
        slots := participant.SlotEndIndex - participant.SlotStartIndex + 1
        totalSlots += slots
        if dealerIndex < len(verification.DealerValidity) && verification.DealerValidity[dealerIndex] {
            validSlots += slots
        }
    }
}

dealerIsValid := totalSlots > 0 && validSlots > totalSlots/2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions