Skip to content

Commit 7a7aab5

Browse files
craig[bot]alicia-l2Dev-Kyleyuzefovichkyle-a-wong
committed
150725: pcr: reorder time series dashboards r=alicia-l2 a=alicia-l2 Swapped order of Replication Lag and Logical Bytes graphs in Physical Cluster Replication DB Console Fixes: #147621 Release note: none 151548: roachtest: fix overlapping failure injection bug r=Dev-Kyle a=Dev-Kyle This change fixes a bug regarding failure injections overlapping over each other when not intended. Marks the mutator step itself to also set unavailableNodes so sequential failure injections don't insert into them. Also considers `waitForStableClusterVersionStep` as incompatible when injecting a network partition recovery. Epic: none Release note: none Fixes: #151522 , #151384 151559: kvcoord: fix txnWriteBuffer for batches with limits and Dels r=yuzefovich a=yuzefovich Previously, the txnWriteBuffer was oblivious to the fact that some transformed requests might be returned incomplete due to limits set on the BatchRequest (either TargetBytes or MaxSpanRequestKeys), so it would incorrectly think that it has acquired locks on some keys when it hasn't. Usage from SQL was only exposed to the bug via special delete-range fast-path where we used point Dels (i.e. a stmt of the form `DELETE FROM t WHERE k IN (<ids>)` where there are gaps between `id`s) since it always sets a key limit of 600. This commit fixes this particular issue for Dels transformed into Gets and adds a couple of assertions that we don't see batches with CPuts and/or Puts with the limits set. Additionally, it adjusts the comment to indicate which requests are allowed in batches with limits. Given that this feature is disabled by default and in the private preview AND it's limited to the DELETE fast-path when more than 600 keys are deleted, I decided to omit the release note. Fixes: #151294. Fixes: #151649. Release note: None 151692: log: add KV_EXEC and CHANGEFEED logging channels r=kyle-a-wong a=kyle-a-wong KV_EXEC is intended to contain KV events that don't fall into the KV_DISTRIBUTION channel CHANGEFEED is for changefeed events that are currently being logged to the TELEMETRY channel Part of: CRDB-53411 Epic: CRDB-53410 Release note (ops change): Introduces two new logging channels: KV_EXEC and CHANGEFEED. KV_EXEC will contain kv events that don't fall into the KV_DISTRIBUTION channel and CHANGEFEED will eventually contain changefeed events (This change doesn't add any logic to move existing changefeed logs to this channel). 151750: roachtest: update pgjdbc test for new portal behavior r=rafiss a=rafiss We recently merged #151153 which makes certain statements invalid when used in a pausable portal. This causes a few PGJDBC tests to fail during the test setup phase. fixes #151582 Release note: None 151752: roachtest: deflake hibernate timeouts r=rafiss a=rafiss This patch increases the timeout to avoid hitting spurious failures when the test takes too long. fixes #151591 fixes #151711 Release note: None 151753: sctestbackupccl: deflake schemachanger tests during BACKUP r=rafiss a=rafiss Increase the timeout for waiting for a schema change to complete, which can take a longer time if there are concurrent backups. fixes #151469 fixes #150842 Release note: None Co-authored-by: Alicia Lu <[email protected]> Co-authored-by: Kyle <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Kyle Wong <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
8 parents 36d351d + 5d8109e + 822cccb + 26fc47b + bd644e6 + fbcc5f4 + aea1736 + 9d6aa82 commit 7a7aab5

File tree

23 files changed

+925
-61
lines changed

23 files changed

+925
-61
lines changed

docs/generated/logging.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,3 +177,12 @@ The `KV_DISTRIBUTION` channel is used to report data distribution events, such a
177177
replicas between stores in the cluster, or adding (removing) replicas to
178178
ranges.
179179

180+
### `CHANGEFEED`
181+
182+
The `CHANGEFEED` channel is used to report changefeed events
183+
184+
### `KV_EXEC`
185+
186+
The `KV_EXEC` channel is used to report KV execution events that don't fall into the
187+
KV_DISTRIBUTION channel.
188+

pkg/cli/testdata/logflags

Lines changed: 39 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,9 @@ SQL_EXEC,
2929
SQL_PERF,
3030
SQL_INTERNAL_PERF,
3131
TELEMETRY,
32-
KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
32+
KV_DISTRIBUTION,
33+
CHANGEFEED,
34+
KV_EXEC],<defaultLogDir>,true,crdb-v2)>,
3335
health: <fileCfg(INFO: [HEALTH],<defaultLogDir>,true,crdb-v2)>,
3436
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
3537
pebble: <fileCfg(INFO: [STORAGE],<defaultLogDir>,true,crdb-v2)>,
@@ -66,7 +68,9 @@ SQL_EXEC,
6668
SQL_PERF,
6769
SQL_INTERNAL_PERF,
6870
TELEMETRY,
69-
KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
71+
KV_DISTRIBUTION,
72+
CHANGEFEED,
73+
KV_EXEC],<defaultLogDir>,true,crdb-v2)>,
7074
health: <fileCfg(INFO: [HEALTH],<defaultLogDir>,true,crdb-v2)>,
7175
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
7276
pebble: <fileCfg(INFO: [STORAGE],<defaultLogDir>,true,crdb-v2)>,
@@ -173,7 +177,9 @@ SQL_EXEC,
173177
SQL_PERF,
174178
SQL_INTERNAL_PERF,
175179
TELEMETRY,
176-
KV_DISTRIBUTION],/pathA/logs,true,crdb-v2)>,
180+
KV_DISTRIBUTION,
181+
CHANGEFEED,
182+
KV_EXEC],/pathA/logs,true,crdb-v2)>,
177183
health: <fileCfg(INFO: [HEALTH],/pathA/logs,true,crdb-v2)>,
178184
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],/pathA/logs,true,crdb-v2)>,
179185
pebble: <fileCfg(INFO: [STORAGE],/pathA/logs,true,crdb-v2)>,
@@ -212,7 +218,9 @@ SQL_EXEC,
212218
SQL_PERF,
213219
SQL_INTERNAL_PERF,
214220
TELEMETRY,
215-
KV_DISTRIBUTION],/mypath,true,crdb-v2)>,
221+
KV_DISTRIBUTION,
222+
CHANGEFEED,
223+
KV_EXEC],/mypath,true,crdb-v2)>,
216224
health: <fileCfg(INFO: [HEALTH],/mypath,true,crdb-v2)>,
217225
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],/mypath,true,crdb-v2)>,
218226
pebble: <fileCfg(INFO: [STORAGE],/mypath,true,crdb-v2)>,
@@ -252,7 +260,9 @@ SQL_EXEC,
252260
SQL_PERF,
253261
SQL_INTERNAL_PERF,
254262
TELEMETRY,
255-
KV_DISTRIBUTION],/pathA/logs,true,crdb-v2)>,
263+
KV_DISTRIBUTION,
264+
CHANGEFEED,
265+
KV_EXEC],/pathA/logs,true,crdb-v2)>,
256266
health: <fileCfg(INFO: [HEALTH],/pathA/logs,true,crdb-v2)>,
257267
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],/pathA/logs,true,crdb-v2)>,
258268
pebble: <fileCfg(INFO: [STORAGE],/pathA/logs,true,crdb-v2)>,
@@ -297,7 +307,9 @@ SQL_EXEC,
297307
SQL_PERF,
298308
SQL_INTERNAL_PERF,
299309
TELEMETRY,
300-
KV_DISTRIBUTION],/mypath,true,crdb-v2)>,
310+
KV_DISTRIBUTION,
311+
CHANGEFEED,
312+
KV_EXEC],/mypath,true,crdb-v2)>,
301313
health: <fileCfg(INFO: [HEALTH],/mypath,true,crdb-v2)>,
302314
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],/mypath,true,crdb-v2)>,
303315
pebble: <fileCfg(INFO: [STORAGE],/mypath,true,crdb-v2)>,
@@ -335,7 +347,9 @@ SQL_EXEC,
335347
SQL_PERF,
336348
SQL_INTERNAL_PERF,
337349
TELEMETRY,
338-
KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
350+
KV_DISTRIBUTION,
351+
CHANGEFEED,
352+
KV_EXEC],<defaultLogDir>,true,crdb-v2)>,
339353
health: <fileCfg(INFO: [HEALTH],<defaultLogDir>,true,crdb-v2)>,
340354
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
341355
pebble: <fileCfg(INFO: [STORAGE],<defaultLogDir>,true,crdb-v2)>,
@@ -374,7 +388,9 @@ SQL_EXEC,
374388
SQL_PERF,
375389
SQL_INTERNAL_PERF,
376390
TELEMETRY,
377-
KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
391+
KV_DISTRIBUTION,
392+
CHANGEFEED,
393+
KV_EXEC],<defaultLogDir>,true,crdb-v2)>,
378394
health: <fileCfg(INFO: [HEALTH],<defaultLogDir>,true,crdb-v2)>,
379395
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
380396
pebble: <fileCfg(INFO: [STORAGE],<defaultLogDir>,true,crdb-v2)>,
@@ -423,7 +439,9 @@ SQL_EXEC,
423439
SQL_PERF,
424440
SQL_INTERNAL_PERF,
425441
TELEMETRY,
426-
KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
442+
KV_DISTRIBUTION,
443+
CHANGEFEED,
444+
KV_EXEC],<defaultLogDir>,true,crdb-v2)>,
427445
health: <fileCfg(INFO: [HEALTH],<defaultLogDir>,true,crdb-v2)>,
428446
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
429447
pebble: <fileCfg(INFO: [STORAGE],<defaultLogDir>,true,crdb-v2)>,
@@ -495,7 +513,9 @@ SQL_EXEC,
495513
SQL_PERF,
496514
SQL_INTERNAL_PERF,
497515
TELEMETRY,
498-
KV_DISTRIBUTION],/mypath,true,crdb-v2)>,
516+
KV_DISTRIBUTION,
517+
CHANGEFEED,
518+
KV_EXEC],/mypath,true,crdb-v2)>,
499519
health: <fileCfg(INFO: [HEALTH],/mypath,true,crdb-v2)>,
500520
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],/mypath,true,crdb-v2)>,
501521
pebble: <fileCfg(INFO: [STORAGE],/mypath,true,crdb-v2)>,
@@ -535,7 +555,9 @@ SQL_EXEC,
535555
SQL_PERF,
536556
SQL_INTERNAL_PERF,
537557
TELEMETRY,
538-
KV_DISTRIBUTION],/pathA,true,crdb-v2)>,
558+
KV_DISTRIBUTION,
559+
CHANGEFEED,
560+
KV_EXEC],/pathA,true,crdb-v2)>,
539561
health: <fileCfg(INFO: [HEALTH],/pathA,true,crdb-v2)>,
540562
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],/pathA,true,crdb-v2)>,
541563
pebble: <fileCfg(INFO: [STORAGE],/pathA,true,crdb-v2)>,
@@ -595,7 +617,9 @@ SQL_EXEC,
595617
SQL_PERF,
596618
SQL_INTERNAL_PERF,
597619
TELEMETRY,
598-
KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
620+
KV_DISTRIBUTION,
621+
CHANGEFEED,
622+
KV_EXEC],<defaultLogDir>,true,crdb-v2)>,
599623
health: <fileCfg(INFO: [HEALTH],<defaultLogDir>,true,crdb-v2)>,
600624
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
601625
pebble: <fileCfg(INFO: [STORAGE],<defaultLogDir>,true,crdb-v2)>,
@@ -633,7 +657,9 @@ SQL_EXEC,
633657
SQL_PERF,
634658
SQL_INTERNAL_PERF,
635659
TELEMETRY,
636-
KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
660+
KV_DISTRIBUTION,
661+
CHANGEFEED,
662+
KV_EXEC],<defaultLogDir>,true,crdb-v2)>,
637663
health: <fileCfg(INFO: [HEALTH],<defaultLogDir>,true,crdb-v2)>,
638664
kv-distribution: <fileCfg(INFO: [KV_DISTRIBUTION],<defaultLogDir>,true,crdb-v2)>,
639665
pebble: <fileCfg(INFO: [STORAGE],<defaultLogDir>,true,crdb-v2)>,

pkg/cmd/roachtest/roachtestutil/mixedversion/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ go_test(
6666
"//pkg/cmd/roachtest/roachtestutil/task",
6767
"//pkg/cmd/roachtest/spec",
6868
"//pkg/roachpb",
69+
"//pkg/roachprod/failureinjection/failures",
6970
"//pkg/roachprod/install",
7071
"//pkg/roachprod/logger",
7172
"//pkg/roachprod/vm",

pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@ import (
8585
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/clusterupgrade"
8686
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/spec"
8787
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test"
88+
"github.com/cockroachdb/cockroach/pkg/roachprod/failureinjection/failures"
8889
"github.com/cockroachdb/cockroach/pkg/roachprod/install"
8990
"github.com/cockroachdb/cockroach/pkg/roachprod/logger"
9091
"github.com/cockroachdb/cockroach/pkg/roachprod/vm"
@@ -383,8 +384,9 @@ type (
383384
// the following are test-only fields, allowing tests to simulate
384385
// cluster properties without passing a cluster.Cluster
385386
// implementation.
386-
_arch *vm.CPUArch
387-
_isLocal *bool
387+
_arch *vm.CPUArch
388+
_isLocal *bool
389+
_getFailer func(name string) (*failures.Failer, error)
388390
}
389391

390392
shouldStop chan struct{}
@@ -964,6 +966,7 @@ func (t *Test) plan() (plan *TestPlan, retErr error) {
964966
bgChans: t.bgChans,
965967
logger: t.logger,
966968
cluster: t.cluster,
969+
_getFailer: t._getFailer,
967970
}
968971
// Let's generate a plan.
969972
plan, err = planner.Plan()

pkg/cmd/roachtest/roachtestutil/mixedversion/mutators.go

Lines changed: 21 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -488,6 +488,8 @@ func (m panicNodeMutator) Generate(
488488
return s == restartStep[0]
489489
})
490490
failureContextSteps.MarkNodesUnavailable(true, false)
491+
addPanicStep[0].hasUnavailableSystemNodes = true
492+
addRestartStep[0].hasUnavailableSystemNodes = true
491493

492494
mutations = append(mutations, addPanicStep...)
493495
mutations = append(mutations, addRestartStep...)
@@ -496,14 +498,20 @@ func (m panicNodeMutator) Generate(
496498
return mutations, nil
497499
}
498500

501+
func GetFailer(planner *testPlanner, name string) (*failures.Failer, error) {
502+
if planner._getFailer != nil {
503+
return planner._getFailer(name)
504+
}
505+
506+
return planner.cluster.GetFailer(planner.logger, planner.cluster.CRDBNodes(), name, false)
507+
}
508+
499509
type networkPartitionMutator struct{}
500510

501511
func (m networkPartitionMutator) Name() string { return failures.IPTablesNetworkPartitionName }
502512

503513
func (m networkPartitionMutator) Probability() float64 {
504-
// Temporarily set to 0 while we investigate a better way to handle
505-
// intersecting failures.
506-
return 0
514+
return 0.3
507515
}
508516

509517
func (m networkPartitionMutator) Generate(
@@ -514,8 +522,7 @@ func (m networkPartitionMutator) Generate(
514522
idx := newStepIndex(plan)
515523
nodeList := planner.currentContext.System.Descriptor.Nodes
516524

517-
failure := failures.GetFailureRegistry()
518-
f, err := failure.GetFailer(planner.cluster.Name(), failures.IPTablesNetworkPartitionName, planner.logger, false)
525+
f, err := GetFailer(planner, failures.IPTablesNetworkPartitionName)
519526
if err != nil {
520527
return nil, fmt.Errorf("failed to get failer for %s: %w", failures.IPTablesNetworkPartitionName, err)
521528
}
@@ -557,6 +564,10 @@ func (m networkPartitionMutator) Generate(
557564
// Many hook steps require communication between specific nodes, so we
558565
// should recover the network partition before running them.
559566
_, runHook := s.impl.(runHookStep)
567+
// Waiting for stable cluster version requires communication between
568+
// all nodes in the cluster, so we should recover the network partition
569+
// before running it.
570+
_, waitForStable := s.impl.(waitForStableClusterVersionStep)
560571

561572
if idx.IsConcurrent(s) {
562573
if firstStepInConcurrentBlock == nil {
@@ -574,7 +585,7 @@ func (m networkPartitionMutator) Generate(
574585
} else {
575586
unavailableNodes = s.context.System.hasUnavailableNodes
576587
}
577-
return unavailableNodes || restartTenant || restartSystem || runHook
588+
return unavailableNodes || restartTenant || restartSystem || runHook || waitForStable
578589
}
579590

580591
_, validStartStep := upgrade.CutAfter(func(s *singleStep) bool {
@@ -619,6 +630,10 @@ func (m networkPartitionMutator) Generate(
619630
})
620631

621632
failureContextSteps.MarkNodesUnavailable(true, true)
633+
addPartition[0].hasUnavailableSystemNodes = true
634+
addPartition[0].hasUnavailableTenantNodes = true
635+
addRecoveryStep[0].hasUnavailableSystemNodes = true
636+
addRecoveryStep[0].hasUnavailableTenantNodes = true
622637

623638
mutations = append(mutations, addPartition...)
624639
mutations = append(mutations, addRecoveryStep...)

pkg/cmd/roachtest/roachtestutil/mixedversion/planner.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ import (
1616
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/option"
1717
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/clusterupgrade"
1818
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test"
19+
"github.com/cockroachdb/cockroach/pkg/roachprod/failureinjection/failures"
1920
"github.com/cockroachdb/cockroach/pkg/roachprod/install"
2021
"github.com/cockroachdb/cockroach/pkg/roachprod/logger"
2122
"github.com/cockroachdb/cockroach/pkg/util/randutil"
@@ -100,6 +101,9 @@ type (
100101

101102
// State variables updated as the test plan is generated.
102103
usingFixtures bool
104+
105+
// Unit test only fields.
106+
_getFailer func(name string) (*failures.Failer, error)
103107
}
104108

105109
// UpgradeStage encodes in what part of an upgrade a test step is in
@@ -151,6 +155,12 @@ type (
151155
reference *singleStep
152156
impl singleStepProtocol
153157
op mutationOp
158+
// unavailableNodes are marked for each step during the `Generate`
159+
// method, but the mutator steps themselves are not created until
160+
// `applyMutations` is called. These booleans denote whether
161+
// the mutator sets any nodes to unavailable.
162+
hasUnavailableSystemNodes bool
163+
hasUnavailableTenantNodes bool
154164
}
155165

156166
// stepSelector provides a high level API for mutator
@@ -1537,6 +1547,10 @@ func (plan *TestPlan) applyMutations(rng *rand.Rand, mutations []mutation) {
15371547
impl: mut.impl,
15381548
rng: rngFromRNG(rng),
15391549
}
1550+
newSingleStep.context.System.hasUnavailableNodes = mut.hasUnavailableSystemNodes
1551+
if newSingleStep.context.Tenant != nil {
1552+
newSingleStep.context.Tenant.hasUnavailableNodes = mut.hasUnavailableTenantNodes
1553+
}
15401554
}
15411555

15421556
switch mut.op {

pkg/cmd/roachtest/roachtestutil/mixedversion/planner_test.go

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import (
1717
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/option"
1818
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil"
1919
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/roachtestutil/clusterupgrade"
20+
"github.com/cockroachdb/cockroach/pkg/roachprod/failureinjection/failures"
2021
"github.com/cockroachdb/cockroach/pkg/roachprod/logger"
2122
"github.com/cockroachdb/cockroach/pkg/roachprod/vm"
2223
"github.com/cockroachdb/cockroach/pkg/testutils/datapathutils"
@@ -353,6 +354,76 @@ func Test_maxNumPlanSteps(t *testing.T) {
353354
require.Nil(t, plan)
354355
}
355356

357+
// TestNoConcurrentFailureInjections tests that failure injection
358+
// steps properly manage node availability. Specifically:
359+
// - Failure injection steps should only run if no other failure is currently injected.
360+
// - Failure recovery steps can only occur if there is an active failure injected.
361+
// - We can only bump the cluster version if no failures are currently injected.
362+
func TestNoConcurrentFailureInjections(t *testing.T) {
363+
const numIterations = 500
364+
rngSource := rand.NewSource(randutil.NewPseudoSeed())
365+
// Set all failure injection mutator probabilities to 1.
366+
var opts []CustomOption
367+
for _, mutator := range failureInjectionMutators {
368+
opts = append(opts, WithMutatorProbability(mutator.Name(), 1.0))
369+
}
370+
opts = append(opts, NumUpgrades(3))
371+
getFailer := func(name string) (*failures.Failer, error) {
372+
return nil, nil
373+
}
374+
375+
for range numIterations {
376+
mvt := newTest(opts...)
377+
mvt._getFailer = getFailer
378+
mvt.InMixedVersion("test hook", dummyHook)
379+
// Use different seed for each iteration
380+
mvt.prng = rand.New(rngSource)
381+
382+
plan, err := mvt.plan()
383+
require.NoError(t, err)
384+
385+
isFailureInjected := false
386+
387+
var checkSteps func(steps []testStep)
388+
checkSteps = func(steps []testStep) {
389+
for _, step := range steps {
390+
switch s := step.(type) {
391+
case *singleStep:
392+
switch s.impl.(type) {
393+
case panicNodeStep:
394+
require.False(t, isFailureInjected, "there should be no active failure when panicNodeStep runs")
395+
isFailureInjected = true
396+
case networkPartitionInjectStep:
397+
require.False(t, isFailureInjected, "there should be no active failure when networkPartitionInjectStep runs")
398+
isFailureInjected = true
399+
case restartNodeStep:
400+
require.True(t, isFailureInjected, "there is no active failure to recover from")
401+
isFailureInjected = false
402+
case networkPartitionRecoveryStep:
403+
require.True(t, isFailureInjected, "there is no active failure to recover from")
404+
isFailureInjected = false
405+
case waitForStableClusterVersionStep:
406+
require.False(t, isFailureInjected, "waitForStableClusterVersionStep cannot run under failure injection")
407+
}
408+
case sequentialRunStep:
409+
checkSteps(s.steps)
410+
case concurrentRunStep:
411+
// Failure injection steps should never run concurrently with other steps, so treat concurrent
412+
// steps as sequential for simplicity.
413+
for _, delayedStepInterface := range s.delayedSteps {
414+
ds := delayedStepInterface.(delayedStep)
415+
checkSteps([]testStep{ds.step})
416+
}
417+
}
418+
}
419+
}
420+
421+
checkSteps(plan.Steps())
422+
423+
require.False(t, isFailureInjected, "all failure injections should be cleaned up at the end of the test")
424+
}
425+
}
426+
356427
// setDefaultVersions overrides the test's view of the current build
357428
// as well as the oldest supported version. This allows the test
358429
// output to remain stable as new versions are released and/or we bump

0 commit comments

Comments
 (0)