Skip to content

Commit 2062eec

Browse files
authored
fix(sync): stall headersub until Syncer has started (#317)
A better version of #316. This version should be easier to understand, but it also addresses a corruption case that I realized #316 doesn't cover. If, for whatever reason, Tail request during the very first start takes more time, then the duplicate subjective init, triggered by the network head from headersub, then the network head will be stored before tail, breaking the further syncing attempts irrecoverably. This will be removed with bsync once the requirement to store Tail before Head is removed. EDIT: [f569260](f569260) completely stalls headersub until started on every start, not just subj init. During a regular start, a headersub header may frontrun the Start routine and trigger Tail update first. This is problematic as the node assumes the Tail is up-to-date after Start finishes, but in the front-running case, only the headersub routine gets blocked (see #313). Extensively tested manually.
1 parent 8f75d4f commit 2062eec

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

sync/syncer.go

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,10 @@ type Syncer[H header.Header[H]] struct {
5353
cancel context.CancelFunc
5454

5555
Params *Parameters
56+
57+
// a hack to prevent headersub interfering during start
58+
// TODO(@Wondertan): Remove after bsync.
59+
started chan struct{}
5660
}
5761

5862
// NewSyncer creates a new instance of Syncer.
@@ -86,15 +90,35 @@ func NewSyncer[H header.Header[H]](
8690
metrics: metrics,
8791
triggerSync: make(chan struct{}, 1), // should be buffered
8892
Params: &params,
93+
started: make(chan struct{}),
8994
}, nil
9095
}
9196

9297
// Start starts the syncing routine.
9398
func (s *Syncer[H]) Start(ctx context.Context) error {
9499
s.ctx, s.cancel = context.WithCancel(context.Background())
100+
95101
// register validator for header subscriptions
96102
// syncer does not subscribe itself and syncs headers together with validation
97103
err := s.sub.SetVerifier(func(ctx context.Context, h H) error {
104+
// HACK(@Wondertan):
105+
// During subjective init we request the Head and then the Tail.
106+
// However, we store them in the opposite order, s.t before Tail arrives
107+
// there is a window where Head awaits Tail without being stored.
108+
// During this window, headersub may deliver new network head. As it can't
109+
// see the awaiting Head, it triggers duplicate subjective initialization
110+
// which may frontrun Tail, violating the storing order and corrupting the store.
111+
// This hack basically stalls headersub until Syncer has fully started, but it should
112+
// not be this way.
113+
//
114+
// This will be fixed with bsync as Syncer will learn how to verify Tail from Head backwards,
115+
// allowing Head to be stored first and then the Tail.
116+
select {
117+
case <-s.started:
118+
case <-ctx.Done():
119+
return ctx.Err()
120+
}
121+
98122
if err := s.incomingNetworkHead(ctx, h); err != nil {
99123
return err
100124
}
@@ -115,6 +139,11 @@ func (s *Syncer[H]) Start(ctx context.Context) error {
115139
}
116140
// start syncLoop only if Start is errorless
117141
go s.syncLoop()
142+
select {
143+
case <-s.started:
144+
default:
145+
close(s.started)
146+
}
118147
return nil
119148
}
120149

0 commit comments

Comments
 (0)