Skip to content

Commit 8ef4468

Browse files
craig[bot]iskettaneh
andcommitted
Merge #142997
142997: raft: ignore setting the lead field from a MsgDeFortify at current term r=iskettaneh a=iskettaneh When receiving a MsgDeFortifyLeader at the same term as we are, we should not set the lead field to the sender. Mainly for two reasons: 1) By definition, a MsgDeFortifyLeader is sent by an ex-leader until it hears of a new committed term. If we forgot the leader at the current term, we shouldn't remember it since we are using the fact that lead==None to indicate that this replica has been leaderless for some time. Read the leaderlessWatcher for more details. 2) This could lead to a situation where no replica can win an election as it could require votes from some replicas that still know who think they know leader is (due to MsgDefortifyLeader), and that have recently campaigned and lost, which reset the electionElapsed to 0. Meaning that this replica is in a heartbeat lease, and will reject MsgVotes. Fixes: #142994 Release note: None Co-authored-by: Ibrahim Kettaneh <[email protected]>
2 parents 8b14b92 + e8a9669 commit 8ef4468

File tree

2 files changed

+269
-1
lines changed

2 files changed

+269
-1
lines changed

pkg/raft/raft.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2258,11 +2258,11 @@ func stepCandidate(r *raft, m pb.Message) error {
22582258

22592259
func stepFollower(r *raft, m pb.Message) error {
22602260
if IsMsgFromLeader(m.Type) {
2261-
r.setLead(m.From)
22622261
if m.Type != pb.MsgDeFortifyLeader {
22632262
// If we receive any message from the leader except a MsgDeFortifyLeader,
22642263
// we know that the leader is still alive and still acting as the leader,
22652264
// so reset the election timer.
2265+
r.setLead(m.From)
22662266
r.electionElapsed = 0
22672267
}
22682268
}
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
# This test ensures that if there is a temporary blip that causes loss of
2+
# quorum, the liveness of the raft group is restored once the blip is fixed.
3+
4+
log-level none
5+
----
6+
ok
7+
8+
add-nodes 2 voters=(1,2) index=10 checkquorum=true prevote=true
9+
----
10+
ok
11+
12+
campaign 1
13+
----
14+
ok
15+
16+
stabilize
17+
----
18+
ok
19+
20+
# Fix the randomized election timeout to be one tick-election.
21+
set-randomized-election-timeout 1 timeout=3
22+
----
23+
ok
24+
25+
set-randomized-election-timeout 2 timeout=3
26+
----
27+
ok
28+
29+
log-level info
30+
----
31+
ok
32+
33+
# Propose a data entry to peer 1. This makes it have the longest log.
34+
propose 1 data1
35+
----
36+
ok
37+
38+
stabilize 1
39+
----
40+
> 1 handling Ready
41+
Ready MustSync=true:
42+
Entries:
43+
1/12 EntryNormal "data1"
44+
Messages:
45+
1->2 MsgApp Term:1 Log:1/11 Commit:11 Entries:[1/12 EntryNormal "data1"]
46+
47+
# Peer 2 has a temporary blip.
48+
deliver-msgs drop=(2)
49+
----
50+
dropped: 1->2 MsgApp Term:1 Log:1/11 Commit:11 Entries:[1/12 EntryNormal "data1"]
51+
52+
# Peer 1's support will eventually expire.
53+
support-expired 1
54+
----
55+
ok
56+
57+
# Peer 1 should detect that it no longer has a quorum and step down.
58+
tick-election 1
59+
----
60+
INFO 1 leader at term 1 does not support itself in the liveness fabric
61+
INFO 1 leader at term 1 does not support itself in the liveness fabric
62+
INFO 1 leader at term 1 does not support itself in the liveness fabric
63+
64+
tick-election 1
65+
----
66+
INFO 1 leader at term 1 does not support itself in the liveness fabric
67+
INFO 1 leader at term 1 does not support itself in the liveness fabric
68+
WARN 1 stepped down to follower since quorum is not active
69+
INFO 1 became follower at term 1
70+
71+
stabilize
72+
----
73+
> 1 handling Ready
74+
Ready MustSync=true:
75+
State:StateFollower
76+
HardState Term:1 Vote:1 Commit:11 Lead:0 LeadEpoch:0
77+
Messages:
78+
1->2 MsgDeFortifyLeader Term:1 Log:0/0
79+
> 2 receiving messages
80+
1->2 MsgDeFortifyLeader Term:1 Log:0/0
81+
> 2 handling Ready
82+
Ready MustSync=true:
83+
HardState Term:1 Vote:1 Commit:11 Lead:1 LeadEpoch:0
84+
85+
# Fix the network blip.
86+
support-expired 1 reset
87+
----
88+
ok
89+
90+
# Note that at this point, the quorum is active, and it should be possible for
91+
# both peers to campaign. However, only peer 1 can actually win the election
92+
# since it has the longer log.
93+
tick-election 2
94+
----
95+
INFO 2 is starting a new election at term 1
96+
INFO 2 became pre-candidate at term 1
97+
INFO 2 [logterm: 1, index: 11] sent MsgPreVote request to 1 at term 1
98+
99+
stabilize
100+
----
101+
> 2 handling Ready
102+
Ready MustSync=true:
103+
State:StatePreCandidate
104+
HardState Term:1 Vote:1 Commit:11 Lead:0 LeadEpoch:0
105+
Messages:
106+
2->1 MsgPreVote Term:2 Log:1/11
107+
INFO 2 received MsgPreVoteResp from 2 at term 1
108+
INFO 2 has received 1 MsgPreVoteResp votes and 0 vote rejections
109+
> 1 receiving messages
110+
2->1 MsgPreVote Term:2 Log:1/11
111+
INFO 1 [logterm: 1, index: 12, vote: 1] rejected MsgPreVote from 2 [logterm: 1, index: 11] at term 1
112+
> 1 handling Ready
113+
Ready MustSync=false:
114+
Messages:
115+
1->2 MsgPreVoteResp Term:1 Log:0/0 Rejected (Hint: 0)
116+
> 2 receiving messages
117+
1->2 MsgPreVoteResp Term:1 Log:0/0 Rejected (Hint: 0)
118+
INFO 2 received MsgPreVoteResp rejection from 1 at term 1
119+
INFO 2 has received 1 MsgPreVoteResp votes and 1 vote rejections
120+
INFO 2 became follower at term 1
121+
> 2 handling Ready
122+
Ready MustSync=false:
123+
State:StateFollower
124+
125+
# Note that both peers have successfully forgotten the leader for term 1.
126+
raft-state
127+
----
128+
1: StateFollower (Voter) Term:1 Lead:0 LeadEpoch:0
129+
2: StateFollower (Voter) Term:1 Lead:0 LeadEpoch:0
130+
131+
# The leader will keep trying to broadcast a MsgDeFortifyLeader until it
132+
# hears about a new term that got committed at a higher term.
133+
send-de-fortify 1 2
134+
----
135+
ok
136+
137+
stabilize
138+
----
139+
> 1 handling Ready
140+
Ready MustSync=false:
141+
Messages:
142+
1->2 MsgDeFortifyLeader Term:1 Log:0/0
143+
> 2 receiving messages
144+
1->2 MsgDeFortifyLeader Term:1 Log:0/0
145+
146+
# Fix the randomized election timeout to be one tick-election.
147+
set-randomized-election-timeout 1 timeout=3
148+
----
149+
ok
150+
151+
tick-election 1
152+
----
153+
INFO 1 is starting a new election at term 1
154+
INFO 1 became pre-candidate at term 1
155+
INFO 1 [logterm: 1, index: 12] sent MsgPreVote request to 2 at term 1
156+
157+
stabilize
158+
----
159+
> 1 handling Ready
160+
Ready MustSync=false:
161+
State:StatePreCandidate
162+
Messages:
163+
1->2 MsgPreVote Term:2 Log:1/12
164+
INFO 1 received MsgPreVoteResp from 1 at term 1
165+
INFO 1 has received 1 MsgPreVoteResp votes and 0 vote rejections
166+
> 2 receiving messages
167+
1->2 MsgPreVote Term:2 Log:1/12
168+
INFO 2 [logterm: 1, index: 11, vote: 1] cast MsgPreVote for 1 [logterm: 1, index: 12] at term 1
169+
> 2 handling Ready
170+
Ready MustSync=false:
171+
Messages:
172+
2->1 MsgPreVoteResp Term:2 Log:0/0
173+
> 1 receiving messages
174+
2->1 MsgPreVoteResp Term:2 Log:0/0
175+
INFO 1 received MsgPreVoteResp from 2 at term 1
176+
INFO 1 has received 2 MsgPreVoteResp votes and 0 vote rejections
177+
INFO 1 became candidate at term 2
178+
INFO 1 [logterm: 1, index: 12] sent MsgVote request to 2 at term 2
179+
> 1 handling Ready
180+
Ready MustSync=true:
181+
State:StateCandidate
182+
HardState Term:2 Vote:1 Commit:11 Lead:0 LeadEpoch:0
183+
Messages:
184+
1->2 MsgVote Term:2 Log:1/12
185+
INFO 1 received MsgVoteResp from 1 at term 2
186+
INFO 1 has received 1 MsgVoteResp votes and 0 vote rejections
187+
> 2 receiving messages
188+
1->2 MsgVote Term:2 Log:1/12
189+
INFO 2 [term: 1] received a MsgVote message with higher term from 1 [term: 2], advancing term
190+
INFO 2 became follower at term 2
191+
INFO 2 [logterm: 1, index: 11, vote: 0] cast MsgVote for 1 [logterm: 1, index: 12] at term 2
192+
> 2 handling Ready
193+
Ready MustSync=true:
194+
HardState Term:2 Vote:1 Commit:11 Lead:0 LeadEpoch:0
195+
Messages:
196+
2->1 MsgVoteResp Term:2 Log:0/0
197+
> 1 receiving messages
198+
2->1 MsgVoteResp Term:2 Log:0/0
199+
INFO 1 received MsgVoteResp from 2 at term 2
200+
INFO 1 has received 2 MsgVoteResp votes and 0 vote rejections
201+
INFO 1 became leader at term 2
202+
> 1 handling Ready
203+
Ready MustSync=true:
204+
State:StateLeader
205+
HardState Term:2 Vote:1 Commit:11 Lead:1 LeadEpoch:1
206+
Entries:
207+
2/13 EntryNormal ""
208+
Messages:
209+
1->2 MsgFortifyLeader Term:2 Log:0/0
210+
1->2 MsgApp Term:2 Log:1/12 Commit:11 Entries:[2/13 EntryNormal ""]
211+
> 2 receiving messages
212+
1->2 MsgFortifyLeader Term:2 Log:0/0
213+
1->2 MsgApp Term:2 Log:1/12 Commit:11 Entries:[2/13 EntryNormal ""]
214+
> 2 handling Ready
215+
Ready MustSync=true:
216+
HardState Term:2 Vote:1 Commit:11 Lead:1 LeadEpoch:1
217+
Messages:
218+
2->1 MsgFortifyLeaderResp Term:2 Log:0/0 LeadEpoch:1
219+
2->1 MsgAppResp Term:2 Log:1/12 Rejected (Hint: 11) Commit:11
220+
> 1 receiving messages
221+
2->1 MsgFortifyLeaderResp Term:2 Log:0/0 LeadEpoch:1
222+
2->1 MsgAppResp Term:2 Log:1/12 Rejected (Hint: 11) Commit:11
223+
> 1 handling Ready
224+
Ready MustSync=false:
225+
Messages:
226+
1->2 MsgApp Term:2 Log:1/11 Commit:11 Entries:[
227+
1/12 EntryNormal "data1"
228+
2/13 EntryNormal ""
229+
]
230+
> 2 receiving messages
231+
1->2 MsgApp Term:2 Log:1/11 Commit:11 Entries:[
232+
1/12 EntryNormal "data1"
233+
2/13 EntryNormal ""
234+
]
235+
> 2 handling Ready
236+
Ready MustSync=true:
237+
Entries:
238+
1/12 EntryNormal "data1"
239+
2/13 EntryNormal ""
240+
Messages:
241+
2->1 MsgAppResp Term:2 Log:0/13 Commit:11
242+
> 1 receiving messages
243+
2->1 MsgAppResp Term:2 Log:0/13 Commit:11
244+
> 1 handling Ready
245+
Ready MustSync=true:
246+
HardState Term:2 Vote:1 Commit:13 Lead:1 LeadEpoch:1
247+
CommittedEntries:
248+
1/12 EntryNormal "data1"
249+
2/13 EntryNormal ""
250+
Messages:
251+
1->2 MsgApp Term:2 Log:2/13 Commit:13
252+
> 2 receiving messages
253+
1->2 MsgApp Term:2 Log:2/13 Commit:13
254+
> 2 handling Ready
255+
Ready MustSync=true:
256+
HardState Term:2 Vote:1 Commit:13 Lead:1 LeadEpoch:1
257+
CommittedEntries:
258+
1/12 EntryNormal "data1"
259+
2/13 EntryNormal ""
260+
Messages:
261+
2->1 MsgAppResp Term:2 Log:0/13 Commit:13
262+
> 1 receiving messages
263+
2->1 MsgAppResp Term:2 Log:0/13 Commit:13
264+
265+
raft-state
266+
----
267+
1: StateLeader (Voter) Term:2 Lead:1 LeadEpoch:1
268+
2: StateFollower (Voter) Term:2 Lead:1 LeadEpoch:1

0 commit comments

Comments
 (0)