Skip to content

Commit 048cb14

Browse files
craig[bot]tbg
andcommitted
Merge #152506
152506: asim: measure thrashing r=tbg a=tbg This commit adds a thrashing benchmark to each stat output. "Thrashing" can be measured in many ways, but here we measure, for each per-store timeseries, a trend-discounting total variation. Roughly speaking, this computes the upward and downward total variation separately, then interpolates between the minimum of both and their sum. When one direction clearly dominates the other, it biases toward the minimum. When up and down are roughly comparable, it returns something approximating the sum - which is the "vanilla" total variation. In other words, thrashing is low when the time series has a clear trend, and it is high when the series moves around but doesn't seem to be trending anywhere. When displaying the resulting value, we additionally normalize (divide by) the "range" of values seen (i.e. the max minus the min), and print the result as a percentage. For example, the time series [0, 1, 0, 1, 0] has an upward and downward variation of 2. Neither dominates, so the trend-adjusted total variation equals the total variation 4. The values are all in `[0,1]`, so the range is 1. The resulting trend-discounting total variation percentage would be `tdtv/range = 4/1 = 400%`. Intuitively, this means that the time series "sweeps" across the range four times. Additional oscillations would drive this number up further. In contast, the series `[0, 1, 0, 10, 100, 1000]` is mostly increasing, with the exception of the small downward jump between the second and third datapoints. Consequently, the trend-discounting total variation would (roughly) equal only the downward total variation, which equals 1. The range is 1000, so the percentage in this case would be `1/1000 = 0.1%`. When printing the thrashing across a set of stores (but for a fixed stat), instead of computing the range within each store's time series, we compute it across the entire set. This makes it more likely that we discover the "true range". All of this can be made arbitrarily more complicated (and additional normalization by the number of runs or ticks may be appropriate for some use cases), but this should be a good start. Epic: CRDB-49117 Release note: none Co-authored-by: Tobias Grieger <[email protected]>
2 parents 223617d + 478c724 commit 048cb14

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+791
-3
lines changed

pkg/BUILD.bazel

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ ALL_TESTS = [
240240
"//pkg/kv/kvserver/allocator/storepool:storepool_test",
241241
"//pkg/kv/kvserver/apply:apply_test",
242242
"//pkg/kv/kvserver/asim/gossip:gossip_test",
243+
"//pkg/kv/kvserver/asim/history:history_test",
243244
"//pkg/kv/kvserver/asim/metrics:metrics_test",
244245
"//pkg/kv/kvserver/asim/op:op_test",
245246
"//pkg/kv/kvserver/asim/queue:queue_test",
@@ -1506,6 +1507,7 @@ GO_TARGETS = [
15061507
"//pkg/kv/kvserver/asim/gossip:gossip",
15071508
"//pkg/kv/kvserver/asim/gossip:gossip_test",
15081509
"//pkg/kv/kvserver/asim/history:history",
1510+
"//pkg/kv/kvserver/asim/history:history_test",
15091511
"//pkg/kv/kvserver/asim/metrics:metrics",
15101512
"//pkg/kv/kvserver/asim/metrics:metrics_test",
15111513
"//pkg/kv/kvserver/asim/mmaintegration:mmaintegration",
Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
load("@io_bazel_rules_go//go:def.bzl", "go_library")
1+
load("@io_bazel_rules_go//go:def.bzl", "go_library", "go_test")
22

33
go_library(
44
name = "history",
5-
srcs = ["history.go"],
5+
srcs = [
6+
"history.go",
7+
"thrashing.go",
8+
],
69
importpath = "github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/history",
710
visibility = ["//visibility:public"],
811
deps = [
@@ -11,3 +14,19 @@ go_library(
1114
"@com_github_montanaflynn_stats//:stats",
1215
],
1316
)
17+
18+
go_test(
19+
name = "history_test",
20+
srcs = ["thrashing_test.go"],
21+
data = glob(["testdata/**"]),
22+
embed = [":history"],
23+
deps = [
24+
"//pkg/testutils",
25+
"//pkg/testutils/datapathutils",
26+
"//pkg/testutils/echotest",
27+
"//pkg/util/randutil",
28+
"@com_github_guptarohit_asciigraph//:asciigraph",
29+
"@com_github_stretchr_testify//assert",
30+
"@com_github_stretchr_testify//require",
31+
],
32+
)

pkg/kv/kvserver/asim/history/history.go

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,72 @@ func (h *History) PerStoreValuesAt(idx int, stat string) []float64 {
4747
return values
4848
}
4949

50+
// ThrashingForStat returns a per-store slice of thrashing measurements for the
51+
// provided stat.
52+
func (h *History) ThrashingForStat(stat string) ThrashingSlice {
53+
if len(h.Recorded) == 0 {
54+
return nil
55+
}
56+
numStores := len(h.PerStoreValuesAt(0, stat))
57+
if numStores == 0 {
58+
return nil
59+
}
60+
61+
vsByStore := make([][]float64, numStores)
62+
for tick := range h.Recorded {
63+
for storeIdx, v := range h.PerStoreValuesAt(tick, stat) {
64+
vsByStore[storeIdx] = append(vsByStore[storeIdx], v)
65+
}
66+
}
67+
68+
ths := make(ThrashingSlice, numStores)
69+
for storeIdx := range vsByStore {
70+
// HACK: we remove leading zeroes before computeThrasing. This works
71+
// around the fact that some timeseries only show sensible values after an
72+
// initial period of inactivity. For example, CPU usage is zero until the
73+
// first stats tick. Without this hack, the large initial jump from zero to
74+
// the first value would be interpreted as variation.
75+
th := computeThrashing(stripLeaderingZeroes(vsByStore[storeIdx]))
76+
ths[storeIdx] = th
77+
}
78+
ths.normalize()
79+
return ths
80+
}
81+
82+
func stripLeaderingZeroes(vs []float64) []float64 {
83+
for i := range vs {
84+
if vs[i] == 0 {
85+
continue
86+
}
87+
return vs[i:]
88+
}
89+
return nil
90+
}
91+
92+
// Thrashing returns a string representation of the thrashing for the given
93+
// stat.
94+
func (h *History) Thrashing(stat string) string {
95+
var buf strings.Builder
96+
_, _ = fmt.Fprintf(&buf, "[")
97+
98+
ths := h.ThrashingForStat(stat)
99+
tvpcts := make([]float64, len(ths))
100+
for i, th := range ths {
101+
if i > 0 {
102+
_, _ = fmt.Fprintf(&buf, ", ")
103+
}
104+
tvpct := th.TDTVPercent()
105+
_, _ = fmt.Fprintf(&buf, "s%d=%.0f%%", i+1, tvpct)
106+
tvpcts[i] = tvpct
107+
}
108+
_, _ = fmt.Fprintf(&buf, "] ")
109+
110+
sum, _ := stats.Sum(tvpcts)
111+
_, _ = fmt.Fprintf(&buf, " (sum=%.0f%%)", sum)
112+
113+
return buf.String()
114+
}
115+
50116
// ShowRecordedValueAt returns a string representation of the recorded values.
51117
// The returned boolean is false if (and only if) the recorded values were all
52118
// zero.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
echo
2+
----
3+
[0.00 0.26 0.46 0.66 0.83 1.00 0.83 0.66 0.46 0.26 0.00]
4+
0.96 ┤ ╭─╮
5+
0.86 ┤ ╭╯ ╰╮
6+
0.77 ┤ ╭╯ ╰╮
7+
0.67 ┤ ╭╯ ╰╮
8+
0.57 ┤ ╭╯ ╰╮
9+
0.48 ┤ ╭╯ ╰╮
10+
0.38 ┤ ╭╯ ╰╮
11+
0.29 ┤ ╭╯ ╰╮
12+
0.19 ┤ │ │
13+
0.10 ┤╭╯ ╰╮
14+
0.00 ┼╯ ╰
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
echo
2+
----
3+
This sequence has high thrashing, since it ends up where it started.
4+
input: [10 8 4 2 1 2 4 8 10]
5+
10.00 ┼─╮ ╭─
6+
9.11 ┤ ╰─╮ ╭─╯
7+
8.22 ┤ ╰─╮ ╭─╯
8+
7.33 ┤ ╰╮ ╭╯
9+
6.44 ┤ ╰╮ ╭╯
10+
5.55 ┤ ╰╮ ╭╯
11+
4.66 ┤ ╰╮ ╭╯
12+
3.77 ┤ ╰─╮ ╭─╯
13+
2.88 ┤ ╰──╮ ╭──╯
14+
1.99 ┤ ╰──╮ ╭──╯
15+
1.10 ┤ ╰───╯
16+
tdtv=200.00% (18.0/9.0) uptv=9.0 dntv=9.0 runs=2
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
echo
2+
----
3+
input: []
4+
tdtv=-0.00% (0.0/-Inf) uptv=0.0 dntv=0.0 runs=1
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
echo
2+
----
3+
An almost monotonic function (the final outlier dominates total variation),
4+
so it is assigned a small thrashing percentage.
5+
input: [1 3 2 1 2005]
6+
2005 ┤ ╭
7+
1805 ┤ ╭╯
8+
1604 ┤ ╭╯
9+
1404 ┤ ╭╯
10+
1203 ┤ ╭╯
11+
1003 ┤ ╭╯
12+
803 ┤ ╭╯
13+
602 ┤ ╭╯
14+
402 ┤ ╭╯
15+
201 ┤ ╭╯
16+
1 ┼─────────────────────────────╯
17+
tdtv=0.50% (10.0/2004.0) uptv=2006.0 dntv=2.0 runs=3
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
echo
2+
----
3+
This sequence has high thrashing, since it ends up where it started.
4+
input: [1 2 4 8 10 8 4 2 1]
5+
9.79 ┤ ╭───╮
6+
8.92 ┤ ╭─╯ ╰─╮
7+
8.04 ┤ ╭─╯ ╰─╮
8+
7.16 ┤ ╭╯ ╰╮
9+
6.28 ┤ ╭╯ ╰╮
10+
5.40 ┤ ╭╯ ╰╮
11+
4.52 ┤ ╭╯ ╰╮
12+
3.64 ┤ ╭─╯ ╰─╮
13+
2.76 ┤ ╭─╯ ╰─╮
14+
1.88 ┤ ╭───╯ ╰───╮
15+
1.00 ┼─╯ ╰─
16+
tdtv=200.00% (18.0/9.0) uptv=9.0 dntv=9.0 runs=2
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
echo
2+
----
3+
An initial outlier leads to a large normalization factor,
4+
i.e. low thrashing percentage. This isn't necessarily good, if this becomes
5+
an issue we could use an inter-quantile range instead.
6+
input: [10250 13 12 1 2]
7+
10250 ┼╮
8+
9225 ┤╰╮
9+
8200 ┤ ╰╮
10+
7175 ┤ ╰╮
11+
6150 ┤ ╰╮
12+
5126 ┤ ╰╮
13+
4101 ┤ ╰╮
14+
3076 ┤ ╰╮
15+
2051 ┤ ╰╮
16+
1026 ┤ ╰╮
17+
1 ┤ ╰─────────────────────────────
18+
tdtv=0.07% (7.3/10249.0) uptv=1.0 dntv=10249.0 runs=2
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
echo
2+
----
3+
Regression test to make sure thrashing in only the last index registers.
4+
input: [2 1 2]
5+
2.00 ┼╮ ╭
6+
1.90 ┤╰╮ ╭╯
7+
1.81 ┤ ╰─╮ ╭─╯
8+
1.71 ┤ ╰─╮ ╭─╯
9+
1.61 ┤ ╰─╮ ╭─╯
10+
1.51 ┤ ╰─╮ ╭─╯
11+
1.42 ┤ ╰─╮ ╭─╯
12+
1.32 ┤ ╰─╮ ╭─╯
13+
1.22 ┤ ╰─╮ ╭─╯
14+
1.12 ┤ ╰─╮ ╭─╯
15+
1.03 ┤ ╰───╯
16+
tdtv=200.00% (2.0/1.0) uptv=1.0 dntv=1.0 runs=2

0 commit comments

Comments
 (0)