Skip to content

Commit 0d5068e

Browse files
craig[bot]wenyihu6
andcommitted
Merge #154416
154416: asim: improve comments for tests under mma r=tbg a=wenyihu6 Epic: CRDB-49117 Resolves: #154361 Release note: none --- **asim: rename constraint_satisfaction1 to constraint_satisfaction** This commit renames constraint_satisfaction1_full_disk.txt to constraint_satisfaction_full_disk.txt. --- **asim: improve comments for constraint_satisfaction_full_disk** This commit improves comments and reduces the test duration for constraint_satisfaction_full_disk.txt. --- **asim: delete redundant/not-really-useful tests** This commit deletes a few redundant or not-really-useful tests. 1. For constraint_satisfaction1, the test sets up a 9-node cluster across 4 regions (a, b, c, d) with 3, 2, 1, and 3 nodes respectively. It creates 100 ranges with 5 replicas each, initially placed on stores s1(a), s2(a), s4(b), s5(b), and s6(c). The span config requires 2 replicas in region a, 2 in region b, and 1 in region c, with lease preferences for region a, so all constraints were already satisfied. The generated load was not very interesting either. We will likely need a similar test in the future, but this version is removed in favor of writing a better one later. 2. For constraint_satisfaction2, it appears to be leftover setup that used rebalance mode 0 with other queues disabled, predating the introduction of configurations. 3. For constraint_satisfaction_old_alloc, it also seems to be leftover and is roughly the same as constraint_satisfaction1.txt. 4. high_cpu_skewed_placement.txt appears redundant with high_cpu.txt, which also uses skewed range placement on s1–3 with load targeting that keyspace. The only difference is an additional CPU workload evenly distributed across nodes, but it doesn’t test anything meaningfully different. In addition, the test set-up was wrong, and the load was hammering only one range. 5. load_distribution_movement_disabled_enable_later.txt was intended to test the switch between SMA and MMA mode, but the setup does not create an interesting scenario. We will likely need a similar test in the future, but this version is removed in favor of writing a better one later. --- **asim: improve comments for tests under mma** This test improves the comments on the setup. I’ve made a first pass to clarify the meaning of various parts, though it may require further review to fully understand the scenario. This is a starting point. --- **asim: shorten duration for certain asim tests** This commit shortens the duration of certain asim tests. Durations were chosen by looking output plots to identify when there is no more rebalancing activities or when thrashing patterns remains. --- **asim: use mib for bytes of gen_ranges** This commit updates the gen_ranges bytes command to accept values in mib. Co-authored-by: wenyihu6 <[email protected]>
2 parents 45cc319 + c4b0469 commit 0d5068e

21 files changed

+371
-520
lines changed

pkg/kv/kvserver/asim/tests/datadriven_simulation_test.go

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ var runAsimTests = envutil.EnvOrDefaultBool("COCKROACH_RUN_ASIM_TESTS", false)
7878
//
7979
// - "gen_ranges" [ranges=<int>]
8080
// [placement_type=(even|skewed|weighted|replica_placement)]
81-
// [repl_factor=<int>] [min_key=<int>] [max_key=<int>] [bytes=<int>]
81+
// [repl_factor=<int>] [min_key=<int>] [max_key=<int>] [bytes_mib=<int>]
8282
// [reset=<bool>]
8383
// Initialize the range generator parameters. On the next call to eval, the
8484
// range generator is called to assign an ranges and their replica
@@ -285,7 +285,7 @@ func TestDataDriven(t *testing.T) {
285285
case "gen_ranges":
286286
var ranges, replFactor = 1, 3
287287
var minKey, maxKey = int64(0), int64(defaultKeyspace)
288-
var bytes int64 = 0
288+
var bytesMiB int64 = 0
289289
var replace bool
290290
var placementTypeStr = "even"
291291
buf := strings.Builder{}
@@ -294,7 +294,7 @@ func TestDataDriven(t *testing.T) {
294294
scanIfExists(t, d, "placement_type", &placementTypeStr)
295295
scanIfExists(t, d, "min_key", &minKey)
296296
scanIfExists(t, d, "max_key", &maxKey)
297-
scanIfExists(t, d, "bytes", &bytes)
297+
scanIfExists(t, d, "bytes_mib", &bytesMiB)
298298
scanIfExists(t, d, "replace", &replace)
299299

300300
placementType := gen.GetRangePlacementType(placementTypeStr)
@@ -310,7 +310,7 @@ func TestDataDriven(t *testing.T) {
310310
MinKey: minKey,
311311
MaxKey: maxKey,
312312
ReplicationFactor: replFactor,
313-
Bytes: bytes,
313+
Bytes: bytesMiB << 20,
314314
ReplicaPlacement: replicaPlacement,
315315
},
316316
PlacementType: placementType,

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma/constraint_satisfaction1.txt

Lines changed: 0 additions & 53 deletions
This file was deleted.

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma/constraint_satisfaction1_full_disk.txt

Lines changed: 0 additions & 50 deletions
This file was deleted.

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma/constraint_satisfaction2.txt

Lines changed: 0 additions & 42 deletions
This file was deleted.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# This test verifies that the allocator can satisfy zone constraints when stores
2+
# have limited disk capacity and replicas need to be rebalanced due to disk
3+
# fullness. The test sets up a 9-node cluster across 4 regions (a, b, c, d) with
4+
# 3, 2, 1, and 3 nodes respectively. It creates 10 ranges with 5 replicas each,
5+
# initially placed only on stores s1, s2, s4, s5, s6 (regions a, b, c). Each store
6+
# has a 10GB capacity and each range is 500MiB, making stores s1, s2, s4, s5, s6
7+
# approximately 61% full initially. The span config requires 2 replicas in region
8+
# a, 2 in region b, and 1 in region c, with lease preferences for region a.
9+
#
10+
# Expected outcome: The allocator should rebalance replicas to satisfy the zone
11+
# constraints while managing disk capacity. Since stores s1 and s2 in region a are
12+
# initially full, some replicas should move to s3 (also in region a) to maintain
13+
# the constraint of 2 replicas in region a while reducing disk pressure. Leases
14+
# should be distributed within region a (s1, s2, s3) due to lease preferences when
15+
# count-based rebalancing is enabled.
16+
gen_cluster nodes=9 region=(a,b,c,d) nodes_per_region=(3,2,1,3) store_byte_capacity_gib=10
17+
----
18+
19+
gen_ranges ranges=10 repl_factor=5 placement_type=replica_placement bytes_mib=500
20+
{s1,s2,s4,s5,s6}:1
21+
----
22+
{s1:*,s2,s4,s5,s6}:1
23+
24+
set_span_config
25+
[0,9999999999): num_replicas=5 num_voters=5 constraints={'+region=a':2,'+region=b':2,'+region=c':1} lease_preferences=[['+region=a']]
26+
----
27+
28+
setting split_queue_enabled=false
29+
----
30+
31+
# TODO(wenyihu6): for mma-only, why didn't we balance more replicas to s3 before stabilizing?
32+
# TODO(wenyihu6): tests more cases here
33+
# 1. whats happens if zone config started being satisfied but the full-disk
34+
# target is against the goal of zone/lease constraints
35+
# 2. what happens if zone config started being un-satisfied but the full-disk
36+
# target is against the goal of zone/lease constraints
37+
# 3. what happens if count-based rebalancing is against any of these goals ^
38+
39+
eval duration=3m samples=1 seed=42 cfgs=(sma-count,mma-count) metrics=(cpu,cpu_util,leases,replicas,disk_fraction_used)
40+
----
41+
disk_fraction_used#1: first: [s1=0.61, s2=0.61, s3=0.00, s4=0.61, s5=0.61, s6=0.61, s7=0.00, s8=0.00, s9=0.00] (stddev=0.30, mean=0.34, sum=3)
42+
disk_fraction_used#1: last: [s1=0.49, s2=0.37, s3=0.37, s4=0.61, s5=0.61, s6=0.61, s7=0.00, s8=0.00, s9=0.00] (stddev=0.26, mean=0.34, sum=3)
43+
disk_fraction_used#1: thrash_pct: [s1=48%, s2=26%, s3=0%, s4=0%, s5=0%, s6=0%, s7=0%, s8=0%, s9=0%] (sum=74%)
44+
leases#1: first: [s1=10, s2=0, s3=0, s4=0, s5=0, s6=0, s7=0, s8=0, s9=0] (stddev=3.14, mean=1.11, sum=10)
45+
leases#1: last: [s1=6, s2=0, s3=4, s4=0, s5=0, s6=0, s7=0, s8=0, s9=0] (stddev=2.13, mean=1.11, sum=10)
46+
leases#1: thrash_pct: [s1=24%, s2=21%, s3=0%, s4=0%, s5=0%, s6=0%, s7=0%, s8=0%, s9=0%] (sum=45%)
47+
replicas#1: first: [s1=10, s2=10, s3=0, s4=10, s5=10, s6=10, s7=0, s8=0, s9=0] (stddev=4.97, mean=5.56, sum=50)
48+
replicas#1: last: [s1=8, s2=6, s3=6, s4=10, s5=10, s6=10, s7=0, s8=0, s9=0] (stddev=4.19, mean=5.56, sum=50)
49+
replicas#1: thrash_pct: [s1=48%, s2=26%, s3=0%, s4=0%, s5=0%, s6=0%, s7=0%, s8=0%, s9=0%] (sum=74%)
50+
artifacts[sma-count]: 7ef09084dfb9e631
51+
==========================
52+
disk_fraction_used#1: first: [s1=0.61, s2=0.61, s3=0.00, s4=0.61, s5=0.61, s6=0.61, s7=0.00, s8=0.00, s9=0.00] (stddev=0.30, mean=0.34, sum=3)
53+
disk_fraction_used#1: last: [s1=0.43, s2=0.43, s3=0.37, s4=0.61, s5=0.61, s6=0.61, s7=0.00, s8=0.00, s9=0.00] (stddev=0.25, mean=0.34, sum=3)
54+
disk_fraction_used#1: thrash_pct: [s1=72%, s2=72%, s3=0%, s4=0%, s5=0%, s6=0%, s7=0%, s8=0%, s9=0%] (sum=143%)
55+
leases#1: first: [s1=10, s2=0, s3=0, s4=0, s5=0, s6=0, s7=0, s8=0, s9=0] (stddev=3.14, mean=1.11, sum=10)
56+
leases#1: last: [s1=5, s2=1, s3=4, s4=0, s5=0, s6=0, s7=0, s8=0, s9=0] (stddev=1.85, mean=1.11, sum=10)
57+
leases#1: thrash_pct: [s1=24%, s2=40%, s3=0%, s4=0%, s5=0%, s6=0%, s7=0%, s8=0%, s9=0%] (sum=64%)
58+
replicas#1: first: [s1=10, s2=10, s3=0, s4=10, s5=10, s6=10, s7=0, s8=0, s9=0] (stddev=4.97, mean=5.56, sum=50)
59+
replicas#1: last: [s1=7, s2=7, s3=6, s4=10, s5=10, s6=10, s7=0, s8=0, s9=0] (stddev=4.17, mean=5.56, sum=50)
60+
replicas#1: thrash_pct: [s1=72%, s2=72%, s3=0%, s4=0%, s5=0%, s6=0%, s7=0%, s8=0%, s9=0%] (sum=143%)
61+
artifacts[mma-count]: 4874ec1d0424aa31
62+
==========================

pkg/kv/kvserver/asim/tests/testdata/non_rand/mma/constraint_satisfaction_old_alloc.txt

Lines changed: 0 additions & 62 deletions
This file was deleted.

0 commit comments

Comments
 (0)