Skip to content

Commit 1fd8581

Browse files
craig[bot]kvoli
andcommitted
100979: asim: extend datadriven test to support recovery r=kvoli a=kvoli Previously, only rebalancing was supported in the data driven simulation test. This commit extends the syntax to support recovery scenarios. As part of the extension, the state generator is split into range generation and cluster generation. Examples are added for each command, along with common testing scenarios such as decommissioning, IO overload, disk fullness and adding a node (with a store). The newly supported commands are listed below: ``` - "load_cluster": config=<name> Load a defined cluster configuration to be the generated cluster in the simulation. The available confiurations are: single_region: 15 nodes in region=US, 5 in each zone US_1/US_2/US_3. single_region_multi_store: 3 nodes, 5 stores per node with the same zone/region configuration as above. multi_region: 36 nodes, 12 in each region and 4 in each zone, regions having 3 zones. complex: 28 nodes, 3 regions with a skewed number of nodes per region. - "gen_ranges" [ranges=<int>] [placement_skew=<bool>] [repl_factor=<int>] [keyspace=<int>] [range_bytes=<int>] Initialize the range generator parameters. On the next call to eval, the range generator is called to assign an ranges and their replica placement. The default values are ranges=1 repl_factor=3 placement_skew=false keyspace=10000. - set_liveness node=<int> [delay=<duration>] status=(dead|decommissioning|draining|unavailable) Set the liveness status of the node with ID NodeID. This applies at the start of the simulation or with some delay after the simulation starts, if specified. - add_node: [stores=<int>] [locality=<string>] [delay=<duration>] Add a node to the cluster after initial generation with some delay, locality and number of stores on the node. The default values are stores=0 locality=none delay=0. - set_span_config [delay=<duration>] [startKey, endKey): <span_config> Provide a new line separated list of spans and span configurations e.g. [0,100): num_replicas=5 num_voters=3 constraints={'+region=US_East'} [100, 500): num_replicas=3 ... This will update the span config for the span [0,100) to specify 3 voting replicas and 2 non-voting replicas, with a constraint that all replicas are in the region US_East. - assertion extended to support two new assertion types: For type=stat assertions, if the stat (e.g. stat=replicas) value of the last ticks (e.g. ticks=5) duration is not exactly equal to threshold, the assertion fails. This applies for a specified store which must be provided with store=storeID. For type=conformance assertions, you may assert on the number of replicas that you expect to be under-replicated(under), over-replicated(over), unavailable(unavailable) and violating constraints(violating) at the end of the evaluation. - "topology" [sample=<int>] Print the cluster locality topology of the sample given (default=last). e.g. for the load_cluster config=single_region US ..US_1 ....└── [1 2 3 4 5] ..US_2 ....└── [6 7 8 9 10] ..US_3 ....└── [11 12 13 14 15] ``` Informs: cockroachdb#90137 Release note: None Co-authored-by: Austen McClernon <[email protected]>
2 parents 5449699 + e4ed3a9 commit 1fd8581

29 files changed

+1464
-148
lines changed

pkg/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1280,6 +1280,7 @@ GO_TARGETS = [
12801280
"//pkg/kv/kvserver/apply:apply",
12811281
"//pkg/kv/kvserver/apply:apply_test",
12821282
"//pkg/kv/kvserver/asim/config:config",
1283+
"//pkg/kv/kvserver/asim/event:event",
12831284
"//pkg/kv/kvserver/asim/gen:gen",
12841285
"//pkg/kv/kvserver/asim/gossip:gossip",
12851286
"//pkg/kv/kvserver/asim/gossip:gossip_test",

pkg/kv/kvserver/asim/BUILD.bazel

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,15 @@ go_library(
77
visibility = ["//visibility:public"],
88
deps = [
99
"//pkg/kv/kvserver/asim/config",
10+
"//pkg/kv/kvserver/asim/event",
1011
"//pkg/kv/kvserver/asim/gossip",
1112
"//pkg/kv/kvserver/asim/metrics",
1213
"//pkg/kv/kvserver/asim/op",
1314
"//pkg/kv/kvserver/asim/queue",
1415
"//pkg/kv/kvserver/asim/state",
1516
"//pkg/kv/kvserver/asim/storerebalancer",
1617
"//pkg/kv/kvserver/asim/workload",
18+
"//pkg/util/log",
1719
],
1820
)
1921

pkg/kv/kvserver/asim/asim.go

Lines changed: 112 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,21 @@ import (
1515
"time"
1616

1717
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/config"
18+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/event"
1819
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/gossip"
1920
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/metrics"
2021
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/op"
2122
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/queue"
2223
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/state"
2324
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/storerebalancer"
2425
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/workload"
26+
"github.com/cockroachdb/cockroach/pkg/util/log"
2527
)
2628

2729
// Simulator simulates an entire cluster, and runs the allocator of each store
2830
// in that cluster.
2931
type Simulator struct {
32+
log.AmbientContext
3033
curr time.Time
3134
end time.Time
3235
// interval is the step between ticks for active simulaton components, such
@@ -36,6 +39,7 @@ type Simulator struct {
3639

3740
// The simulator can run multiple workload Generators in parallel.
3841
generators []workload.Generator
42+
events event.DelayedEventList
3943

4044
pacers map[state.StoreID]queue.ReplicaPacer
4145

@@ -53,6 +57,8 @@ type Simulator struct {
5357
gossip gossip.Gossip
5458
shuffler func(n int, swap func(i, j int))
5559

60+
settings *config.SimulationSettings
61+
5662
metrics *metrics.Tracker
5763
history History
5864
}
@@ -62,6 +68,7 @@ type Simulator struct {
6268
// TODO(kvoli): Add a range log like structure to the history.
6369
type History struct {
6470
Recorded [][]metrics.StoreMetrics
71+
S state.State
6572
}
6673

6774
// Listen implements the metrics.StoreMetricListener interface.
@@ -76,82 +83,101 @@ func NewSimulator(
7683
initialState state.State,
7784
settings *config.SimulationSettings,
7885
m *metrics.Tracker,
86+
events ...event.DelayedEvent,
7987
) *Simulator {
8088
pacers := make(map[state.StoreID]queue.ReplicaPacer)
8189
rqs := make(map[state.StoreID]queue.RangeQueue)
8290
sqs := make(map[state.StoreID]queue.RangeQueue)
8391
srs := make(map[state.StoreID]storerebalancer.StoreRebalancer)
8492
changer := state.NewReplicaChanger()
8593
controllers := make(map[state.StoreID]op.Controller)
94+
95+
s := &Simulator{
96+
AmbientContext: log.MakeTestingAmbientCtxWithNewTracer(),
97+
curr: settings.StartTime,
98+
end: settings.StartTime.Add(duration),
99+
interval: settings.TickInterval,
100+
generators: wgs,
101+
state: initialState,
102+
changer: changer,
103+
rqs: rqs,
104+
sqs: sqs,
105+
controllers: controllers,
106+
srs: srs,
107+
pacers: pacers,
108+
gossip: gossip.NewGossip(initialState, settings),
109+
metrics: m,
110+
shuffler: state.NewShuffler(settings.Seed),
111+
// TODO(kvoli): Keeping the state around is a bit hacky, find a better
112+
// method of reporting the ranges.
113+
history: History{Recorded: [][]metrics.StoreMetrics{}, S: initialState},
114+
events: events,
115+
settings: settings,
116+
}
117+
86118
for _, store := range initialState.Stores() {
87119
storeID := store.StoreID()
88-
allocator := initialState.MakeAllocator(storeID)
89-
storePool := initialState.StorePool(storeID)
90-
// TODO(kvoli): Instead of passing in individual settings to construct
91-
// the each ticking component, pass a pointer to the simulation
92-
// settings struct. That way, the settings may be adjusted dynamically
93-
// during a simulation.
94-
rqs[storeID] = queue.NewReplicateQueue(
95-
storeID,
96-
changer,
97-
settings.ReplicaChangeDelayFn(),
98-
allocator,
99-
storePool,
100-
settings.StartTime,
101-
)
102-
sqs[storeID] = queue.NewSplitQueue(
103-
storeID,
104-
changer,
105-
settings.RangeSplitDelayFn(),
106-
settings.RangeSizeSplitThreshold,
107-
settings.StartTime,
108-
)
109-
pacers[storeID] = queue.NewScannerReplicaPacer(
110-
initialState.NextReplicasFn(storeID),
111-
settings.PacerLoopInterval,
112-
settings.PacerMinIterInterval,
113-
settings.PacerMaxIterIterval,
114-
settings.Seed,
115-
)
116-
controllers[storeID] = op.NewController(
117-
changer,
118-
allocator,
119-
storePool,
120-
settings,
121-
storeID,
122-
)
123-
srs[storeID] = storerebalancer.NewStoreRebalancer(
124-
settings.StartTime,
125-
storeID,
126-
controllers[storeID],
127-
allocator,
128-
storePool,
129-
settings,
130-
storerebalancer.GetStateRaftStatusFn(initialState),
131-
)
120+
s.addStore(storeID, settings.StartTime)
132121
}
122+
s.state.RegisterConfigChangeListener(s)
133123

134-
s := &Simulator{
135-
curr: settings.StartTime,
136-
end: settings.StartTime.Add(duration),
137-
interval: settings.TickInterval,
138-
generators: wgs,
139-
state: initialState,
140-
changer: changer,
141-
rqs: rqs,
142-
sqs: sqs,
143-
controllers: controllers,
144-
srs: srs,
145-
pacers: pacers,
146-
gossip: gossip.NewGossip(initialState, settings),
147-
metrics: m,
148-
shuffler: state.NewShuffler(settings.Seed),
149-
history: History{Recorded: [][]metrics.StoreMetrics{}},
150-
}
151124
m.Register(&s.history)
125+
s.AddLogTag("asim", nil)
152126
return s
153127
}
154128

129+
// StoreAddNotify notifies that a new store has been added with ID storeID.
130+
func (s *Simulator) StoreAddNotify(storeID state.StoreID, _ state.State) {
131+
s.addStore(storeID, s.curr)
132+
}
133+
134+
func (s *Simulator) addStore(storeID state.StoreID, tick time.Time) {
135+
allocator := s.state.MakeAllocator(storeID)
136+
storePool := s.state.StorePool(storeID)
137+
// TODO(kvoli): Instead of passing in individual settings to construct
138+
// the each ticking component, pass a pointer to the simulation
139+
// settings struct. That way, the settings may be adjusted dynamically
140+
// during a simulation.
141+
s.rqs[storeID] = queue.NewReplicateQueue(
142+
storeID,
143+
s.changer,
144+
s.settings.ReplicaChangeDelayFn(),
145+
allocator,
146+
storePool,
147+
tick,
148+
)
149+
s.sqs[storeID] = queue.NewSplitQueue(
150+
storeID,
151+
s.changer,
152+
s.settings.RangeSplitDelayFn(),
153+
s.settings.RangeSizeSplitThreshold,
154+
tick,
155+
)
156+
s.pacers[storeID] = queue.NewScannerReplicaPacer(
157+
s.state.NextReplicasFn(storeID),
158+
s.settings.PacerLoopInterval,
159+
s.settings.PacerMinIterInterval,
160+
s.settings.PacerMaxIterIterval,
161+
s.settings.Seed,
162+
)
163+
s.controllers[storeID] = op.NewController(
164+
s.changer,
165+
allocator,
166+
storePool,
167+
s.settings,
168+
storeID,
169+
)
170+
s.srs[storeID] = storerebalancer.NewStoreRebalancer(
171+
tick,
172+
storeID,
173+
s.controllers[storeID],
174+
allocator,
175+
storePool,
176+
s.settings,
177+
storerebalancer.GetStateRaftStatusFn(s.state),
178+
)
179+
}
180+
155181
// GetNextTickTime returns a simulated tick time, or an indication that the
156182
// simulation is done.
157183
func (s *Simulator) GetNextTickTime() (done bool, tick time.Time) {
@@ -189,9 +215,15 @@ func (s *Simulator) RunSim(ctx context.Context) {
189215
break
190216
}
191217

218+
s.AddLogTag("tick", tick.Format(time.StampMilli))
219+
ctx = s.AmbientContext.AnnotateCtx(ctx)
220+
192221
// Update the store clocks with the current tick time.
193222
s.tickStoreClocks(tick)
194223

224+
// Tick any events.
225+
s.tickEvents(ctx, tick)
226+
195227
// Update the state with generated load.
196228
s.tickWorkload(ctx, tick)
197229

@@ -311,3 +343,22 @@ func (s *Simulator) tickStoreRebalancers(ctx context.Context, tick time.Time, st
311343
func (s *Simulator) tickMetrics(ctx context.Context, tick time.Time) {
312344
s.metrics.Tick(ctx, tick, s.state)
313345
}
346+
347+
// tickEvents ticks the registered simulation events.
348+
func (s *Simulator) tickEvents(ctx context.Context, tick time.Time) {
349+
var idx int
350+
// Assume the events are in sorted order and the event list is never added
351+
// to.
352+
for i := range s.events {
353+
if !tick.Before(s.events[i].At) {
354+
idx = i + 1
355+
log.Infof(ctx, "applying event (scheduled=%s tick=%s)", s.events[i].At, tick)
356+
s.events[i].EventFn(ctx, tick, s.state)
357+
} else {
358+
break
359+
}
360+
}
361+
if idx != 0 {
362+
s.events = s.events[idx:]
363+
}
364+
}

pkg/kv/kvserver/asim/asim_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,6 @@ func TestAllocatorSimulatorDeterministic(t *testing.T) {
8787
refRun = history
8888
continue
8989
}
90-
require.Equal(t, refRun, history)
90+
require.Equal(t, refRun.Recorded, history.Recorded)
9191
}
9292
}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
load("@io_bazel_rules_go//go:def.bzl", "go_library")
2+
3+
go_library(
4+
name = "event",
5+
srcs = ["delayed_event.go"],
6+
importpath = "github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/event",
7+
visibility = ["//visibility:public"],
8+
deps = ["//pkg/kv/kvserver/asim/state"],
9+
)
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
// Copyright 2023 The Cockroach Authors.
2+
//
3+
// Use of this software is governed by the Business Source License
4+
// included in the file licenses/BSL.txt.
5+
//
6+
// As of the Change Date specified in that file, in accordance with
7+
// the Business Source License, use of this software will be governed
8+
// by the Apache License, Version 2.0, included in the file
9+
// licenses/APL.txt.
10+
11+
package event
12+
13+
import (
14+
"context"
15+
"time"
16+
17+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/state"
18+
)
19+
20+
type DelayedEventList []DelayedEvent
21+
22+
// Len implements sort.Interface.
23+
func (del DelayedEventList) Len() int { return len(del) }
24+
25+
// Less implements sort.Interface.
26+
func (del DelayedEventList) Less(i, j int) bool {
27+
return del[i].At.Before(del[j].At)
28+
}
29+
30+
// Swap implements sort.Interface.
31+
func (del DelayedEventList) Swap(i, j int) {
32+
del[i], del[j] = del[j], del[i]
33+
}
34+
35+
type DelayedEvent struct {
36+
At time.Time
37+
EventFn func(context.Context, time.Time, state.State)
38+
}

pkg/kv/kvserver/asim/gen/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ go_library(
88
deps = [
99
"//pkg/kv/kvserver/asim",
1010
"//pkg/kv/kvserver/asim/config",
11+
"//pkg/kv/kvserver/asim/event",
1112
"//pkg/kv/kvserver/asim/metrics",
1213
"//pkg/kv/kvserver/asim/state",
1314
"//pkg/kv/kvserver/asim/workload",

0 commit comments

Comments
 (0)