Skip to content

Commit 6bfc8db

Browse files
craig[bot]williamchoe3DarrylWongrickystewart
committed
153269: roachtest/mixedversion: sync workload binary versions with the cluster version r=DarrylWong a=williamchoe3 Fixes #147374 ### Problem The current workload binary is no longer backwards compatible with certain workloads i.e. `bank` (see original issue above for more information). Previously, the workload binary would just use whatever default binary was staged on the workload node by default. In a test, the test author could explicitly download an older version of workload to circumvent version compatibility issues or use `Workload`'s `override` ([PR](https://github.com/cockroachdb/cockroach/pull/151165/files#diff-11eb39229832689838a549c5cd62b0adeb799fc4e58468ab17268ffcc86c6bd4L829 )) parameter to download a binary matching the cluster version to the workload node when the workload hook is being executed. #### Change While the later's solution works, it makes more sense to always just use a workload binary that matches the cluster. If a test wants to use a specific workload binary version though, that functionality remains possible as well. During `startStep`, while we are staging the cluster at it's initial version, we also stage ~~that version's binary~~ all the versioned binaries in the upgrade plan on the workload node(s). Previously we just left the workload node alone at this step. ~~Then, introduced a new test framework specific step `stageWorkloadBinaryStep` which gets added in `test.Planner.afterUpgradeSteps` which get added to the plan after a cluster finalizes it's upgrade. `stageWorkloadBinaryStep` stages the finalized cluster version's binary on the workload node(s).~~ Removed the call to `UploadCockroach` from `Workload` as when the workload init or run hooks are called, the cluster binary will already exist on the workload node. ~~Kept `override` parameter, but now if override is set, `Workload` will simply just respect the binary that get's passed into the command instead of defaulting the current cluster version.~~ * If a test wants to use a specific Workload binary, they are able to do that if they want without using the Framework's helper function Added a new helper method to `Helper` which can be used in hooks to abstract a lower level roachtest framework call ```go clusterupgrade.BinaryPathForVersion(t.rt, h.System.FromVersion, "cockroach") --> CockroachBinaryForWorkload(t) ``` Added new mixedversion test option which will add the framework step to stage all versioned cockroach binaries in the upgrade plan to the WorkloadNode ```go mixedversion.WithWorkloadNodes(c.WorkloadNode()) ``` ### Example Given a plan e.g. during the `run startup hooks concurrently` set of steps ``` Plan: Upgrades: v25.2.5 → <current> Deployment mode: system-only Mutators: cluster_setting[kv.transaction.write_buffering.enabled], cluster_setting[kv.rangefeed.buffered_sender.enabled] Plan: ├── install fixtures for version "v25.2.5" (1) ├── start cluster at version "v25.2.5" (2) ├── wait for all nodes (:1-4) to acknowledge cluster version '25.2' on system tenant (3) ├── stage workload binary on workload node(s) :5 for version(s) v25.2.5, <current> (4) <-- NEW ├── run startup hooks concurrently │ ├── run "maybe enable tenant features", after 30s delay (5) │ ├── run "load TPCC dataset", after 500ms delay (6) │ ├── run "load bank dataset", after 30s delay (7) │ └── set cluster setting "kv.rangefeed.buffered_sender.enabled" to 'true' on system tenant, after 3m0s delay (8) └── upgrade cluster from "v25.2.5" to "<current>" ``` ### Verification Adhoc Nightly against all mixed version tests https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/20482572?buildTab=overview&expandBuildDeploymentsSection=false&expandBuildChangesSection=true&hideTestsFromDependencies=false&expandBuildTestsSection=true&hideProblemsFromDependencies=false&expandBuildProblemsSection=true Added a `gomock` for Cluster in cluster/mock for unit tests * Removed duplicate Cluster mock in clusterstats * Wanted to colocate the mocks with the actual interface they are mocking. Then each package that wants to use the mock just needs to import that mock instead of creating a new mock in that package and creating a new generated file for that mock * Added datadriven unit test that captures the new post upgrade internal framework step While looking at `Test` I saw these test specific fields, (not sure what this specific methodology is called), but just curious because my initial thought was to add a mock Cluster implementation, because I have workload node related logic that needs to call cluster interface methods. I guess i could've maybe went with something like below, but I personally don't like intertwining prod logic and test logic. I guess is there any reason we wouldn't want the mock cluster? Besides the boiler plate it creates in the unit tests? https://github.com/cockroachdb/cockroach/blob/b4738823bcc55547ff431b99684576370760e56d/pkg/cmd/roachtest/roachtestutil/mixedversion/planner_test.go ``` // Test is the main struct callers of this package interact with. Test struct { // the following are test-only fields, allowing tests to simulate // cluster properties without passing a cluster.Cluster // implementation. _arch *vm.CPUArch _isLocal *bool _getFailer func(name string) (*failures.Failer, error) ``` ### mixedversion test refactor Key 👌: no workload, no change needed ➕: added mixedversion.WithWorkloadNodes(c.WorkloadNode()) test option to stage versioned binaries on workload node ✅: removed hardcoded "./cockroach workload ..." call to something like "%s workload... ", h.CockroachBinaryForWorkload(t) ⚠️: can't add versioned call because workload logic not defined in a user hook. * Most of these tests use Workload() which will override the binary with the versioned one by default * Some of these tests have logic around the Run cmd so calling Workload doesn't provide an easy fix, but the test should continue to work with the unversioned (default) binary ❌: not refactoring bc it's not using bank and the logic is tighty coupled with shared non mixedversion roachtests so would need to decouple, which I think would make the test logic harder to understand and not worth having a 1 line helper Test Changes ``` 👌acceptance/validate-system-schema-after-version-upgrade [sql-foundations] randomized,timeout: 1h0m0s 👌acceptance/version-upgrade [test-eng] randomized,timeout: 2h0m0s ➕✅admission-control/elastic-workload/mixed-version [kv] timeout: 3h0m0s ➕⚠️backup-restore/mixed-version [disaster-recovery] randomized,timeout: 8h0m0s ➕⚠️c2c/mixed-version [disaster-recovery] * 2 clusters, 1 workload node, put the test option to upgrade the workload node in both clusters, techniaclly 1 of them might be redundant, but i just wanna be sure by the time either cluster makes a workload call it's good, not sure if it's the 1st cluster, 2nd cluster, or random which one goes first so just putting in both, also binary existence is checked before staging so no redundent downloads will be done ➕✅cdc/mixed-version/checkpointing [cdc] randomized,timeout: 3h0m0s ➕✅cdc/mixed-versions [cdc] randomized,timeout: 3h0m0s 👌change-replicas/mixed-version [kv] randomized,timeout: 1h0m0s 👌db-console/mixed-version-cypress [obs-prs] timeout: 2h0m0s * uses workload node for another task that's not running workload so no change is needed ➕✅db-console/mixed-version-endpoints [obs-prs] randomized,timeout: 1h0m0s 👌declarative_schema_changer/job-compatibility-mixed-version-V242-V243 [sql-foundations] randomized 👌~~➕✅~~decommission/mixed-versions [kv] randomized * ~~this test didn't have a workload node previously and I didn't see a reason not to use one (unlike some other tests that want the extra pressure on the node)~~ * data loading is fine to do on a cluster node (see comments below) 👌follower-reads/mixed-version/single-region [kv] randomized 👌follower-reads/mixed-version/survival=region/locality=global/reads=strong [kv] randomized 👌http-register-routes/mixed-version [obs-prs] randomized,timeout: 1h0m0s ~~➕✅~~import/mixed-versions [sql-queries] (skipped: Issue #143870) randomized * don't need to use an explicit workload node here (see comments below) 👌jobs/mixed-versions [disaster-recovery] randomized ➕✅⚠️ldr/mixed-version [disaster-recovery] * the init is in a hook, i can user the helper, the run isn't in a hook it uses Workload, so i didn't replace it ➕❌multi-region/mixed-version [test-eng] randomized,timeout: 36h0m0s ❌multitenant-upgrade [server] randomized,timeout: 5h0m0s ❌rebalance/by-load/leases/mixed-version [kv] randomized ❌rebalance/by-load/replicas/mixed-version [kv] randomized ❌schemachange/mixed-versions [sql-foundations] randomized ❌schemachange/mixed-versions-compat [sql-foundations] ❌schemachange/secondary-index-multi-version [sql-foundations] randomized ➕⚠️sql-stats/mixed-version [obs-prs] randomized,timeout: 1h0m0s ➕✅tpcc/mixed-headroom/chaos/n6cpu16 [test-eng] randomized,timeout: 7h0m0s ➕✅tpcc/mixed-headroom/n5cpu16 [test-eng] randomized,timeout: 7h0m0s 👌validate-system-schema-after-version-upgrade/separate-process [sql-foundations] ``` #### Notes `pkg/cmd/roachtest/clusterstats/BUILD.bazel` I made some edits initially, removed them, and now ./dev gen is generating go_library / go_test deps in a different order 🤷‍♂️ * Saw in `pkg/cmd/roachtest/roachtestutil/clusterupgrade/clusterupgrade.go` in `clusterupgrade` pkg, `UploadWorkload` but chose not to use it because it's a currently unused path that also eventually calls a deprecated Cluster API function and `UploadCockroach` & `UploadWorkload` are both just wrappers around `uploadBinaryVersion` 153964: roachtest: allow specifying a set of valid architectures r=herkolategan a=DarrylWong Previously, tests could only specify one valid architecture or enable all of them to be metamorphically chosen. This changes the cluster spec to allow multiple architectures to be specified. This is now needed because we recently started running FIPS metamorphically on our nightlies. Some tests are not compatible with ARM or FIPS, and we want to only disable execution on those architectures. Fixes: #153833 Fixes: #154110 Informs: #152781 154476: build: update patched Go to the latest commit r=rail,RaduBerinde a=rickystewart To consume cockroachdb/go@889a976. Co-authored-by: William Choe <[email protected]> Co-authored-by: DarrylWong <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
4 parents 390f231 + 9b2f590 + ed14bb7 + ac56c72 commit 6bfc8db

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+681
-236
lines changed

WORKSPACE

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -167,14 +167,14 @@ load(
167167
go_download_sdk(
168168
name = "go_sdk",
169169
sdks = {
170-
"darwin_amd64": ("go1.23.12.darwin-amd64.tar.gz", "34457131f14281e21e25493d68e7519ccf26342d176dac36a4fc5dbf5ef122d9"),
171-
"darwin_arm64": ("go1.23.12.darwin-arm64.tar.gz", "30e0735ab9ccda203946536d24afe895abd1a1d3f35ad199f9768ccbdd5d60bc"),
172-
"linux_amd64": ("go1.23.12.linux-amd64.tar.gz", "0cac0ac930ecb9458b8a0a7969cbf735c5884d24c879c97eb28a8997eca986fa"),
173-
"linux_arm64": ("go1.23.12.linux-arm64.tar.gz", "528601fc8fb2c7e5ce8b7ae7651fd4fce2450bbef687beb96616edc5a9effa41"),
174-
"linux_s390x": ("go1.23.12.linux-s390x.tar.gz", "f3f11bbb731da6716776d1c29a2db3d1063fa0a9f8c00636e6a77793ba79e2e3"),
175-
"windows_amd64": ("go1.23.12.windows-amd64.tar.gz", "71b5b5b86b3a5ff9f124e21984abd874a6bfeb438f368de2eee7c60a25a19c94"),
170+
"darwin_amd64": ("go1.23.12.darwin-amd64.tar.gz", "25b853b77448c6f196475c1ab44d4a617b719db34e9e072c72c927a905bdbce6"),
171+
"darwin_arm64": ("go1.23.12.darwin-arm64.tar.gz", "fae1c6d45f72d559b9fe52224918321e7642cb75cd6a71186c7ccdf577189a4f"),
172+
"linux_amd64": ("go1.23.12.linux-amd64.tar.gz", "03ac2c00dfde86a4beb3e94f063edffe1d2c089dfbbe1ec2776d1531c58e196b"),
173+
"linux_arm64": ("go1.23.12.linux-arm64.tar.gz", "442f8d3510d141d6f479f05f7de00560aa1df59e0a5e1d02ca58ed9e4757d0d7"),
174+
"linux_s390x": ("go1.23.12.linux-s390x.tar.gz", "6e3c09786b434bdd8d44de714656bef7c4fe484d2a2e6f8c98a797385e5f280e"),
175+
"windows_amd64": ("go1.23.12.windows-amd64.tar.gz", "513af02afaa1c64501f1ec0a9c9445dbf88ea55c0b7a5a86e0b7fc94f0c59b39"),
176176
},
177-
urls = ["https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/{}"],
177+
urls = ["https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/{}"],
178178
version = "1.23.12",
179179
)
180180

@@ -659,9 +659,8 @@ go_download_sdk(
659659
# able to provide additional diagnostic information such as the expected version of OpenSSL.
660660
experiments = ["boringcrypto"],
661661
sdks = {
662-
663-
"linux_amd64": ("go1.23.12fips.linux-amd64.tar.gz", "9c58fd7137b4c9d387a5c37fd2e728bc5d39357c7f8ba3358bcae513704c2983"),
662+
"linux_amd64": ("go1.23.12fips.linux-amd64.tar.gz", "ae2d57fa43ef68aa70e6c0c1def065bd2806411acfc0d59591ddff81168be095"),
664663
},
665-
urls = ["https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/{}"],
664+
urls = ["https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/{}"],
666665
version = "1.23.12fips",
667666
)

build/bazelutil/distdir_files.bzl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1188,13 +1188,13 @@ DISTDIR_FILES = {
11881188
"https://storage.googleapis.com/public-bazel-artifacts/c-deps/20250801-193032/libproj_foreign.macos.20250801-193032.tar.gz": "8d28434cd175f0a32dfdd8ba8a5fa44c3d04d1e53cccfe9dbb3c6e301a03a47c",
11891189
"https://storage.googleapis.com/public-bazel-artifacts/c-deps/20250801-193032/libproj_foreign.macosarm.20250801-193032.tar.gz": "a4b0bbb056bb462682b49ec34816f02c71047b38733d50d8de78b737c892db61",
11901190
"https://storage.googleapis.com/public-bazel-artifacts/c-deps/20250801-193032/libproj_foreign.windows.20250801-193032.tar.gz": "a61f4faf7a7d017a194c64b453a38c982423ef3678fa049dbf114920759da59c",
1191-
"https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/go1.23.12.darwin-amd64.tar.gz": "34457131f14281e21e25493d68e7519ccf26342d176dac36a4fc5dbf5ef122d9",
1192-
"https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/go1.23.12.darwin-arm64.tar.gz": "30e0735ab9ccda203946536d24afe895abd1a1d3f35ad199f9768ccbdd5d60bc",
1193-
"https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/go1.23.12.linux-amd64.tar.gz": "0cac0ac930ecb9458b8a0a7969cbf735c5884d24c879c97eb28a8997eca986fa",
1194-
"https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/go1.23.12.linux-arm64.tar.gz": "528601fc8fb2c7e5ce8b7ae7651fd4fce2450bbef687beb96616edc5a9effa41",
1195-
"https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/go1.23.12.linux-s390x.tar.gz": "f3f11bbb731da6716776d1c29a2db3d1063fa0a9f8c00636e6a77793ba79e2e3",
1196-
"https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/go1.23.12.windows-amd64.tar.gz": "71b5b5b86b3a5ff9f124e21984abd874a6bfeb438f368de2eee7c60a25a19c94",
1197-
"https://storage.googleapis.com/public-bazel-artifacts/go/20250818-202337/go1.23.12fips.linux-amd64.tar.gz": "9c58fd7137b4c9d387a5c37fd2e728bc5d39357c7f8ba3358bcae513704c2983",
1191+
"https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/go1.23.12.darwin-amd64.tar.gz": "25b853b77448c6f196475c1ab44d4a617b719db34e9e072c72c927a905bdbce6",
1192+
"https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/go1.23.12.darwin-arm64.tar.gz": "fae1c6d45f72d559b9fe52224918321e7642cb75cd6a71186c7ccdf577189a4f",
1193+
"https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/go1.23.12.linux-amd64.tar.gz": "03ac2c00dfde86a4beb3e94f063edffe1d2c089dfbbe1ec2776d1531c58e196b",
1194+
"https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/go1.23.12.linux-arm64.tar.gz": "442f8d3510d141d6f479f05f7de00560aa1df59e0a5e1d02ca58ed9e4757d0d7",
1195+
"https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/go1.23.12.linux-s390x.tar.gz": "6e3c09786b434bdd8d44de714656bef7c4fe484d2a2e6f8c98a797385e5f280e",
1196+
"https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/go1.23.12.windows-amd64.tar.gz": "513af02afaa1c64501f1ec0a9c9445dbf88ea55c0b7a5a86e0b7fc94f0c59b39",
1197+
"https://storage.googleapis.com/public-bazel-artifacts/go/20250930-204932/go1.23.12fips.linux-amd64.tar.gz": "ae2d57fa43ef68aa70e6c0c1def065bd2806411acfc0d59591ddff81168be095",
11981198
"https://storage.googleapis.com/public-bazel-artifacts/java/railroad/rr-1.63-java8.zip": "d2791cd7a44ea5be862f33f5a9b3d40aaad9858455828ebade7007ad7113fb41",
11991199
"https://storage.googleapis.com/public-bazel-artifacts/js/rules_jest-v0.18.4.tar.gz": "d3bb833f74b8ad054e6bff5e41606ff10a62880cc99e4d480f4bdfa70add1ba7",
12001200
"https://storage.googleapis.com/public-bazel-artifacts/js/rules_js-v1.42.3.tar.gz": "2cfb3875e1231cefd3fada6774f2c0c5a99db0070e0e48ea398acbff7c6c765b",
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
309f11146b97839ffbba1ac245b7aa901e3dbbcb
1+
889a9764a479870035d3fb31d1910ce6c654533c

pkg/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1266,6 +1266,7 @@ GO_TARGETS = [
12661266
"//pkg/cmd/roachprod/grafana:grafana",
12671267
"//pkg/cmd/roachprod:roachprod",
12681268
"//pkg/cmd/roachprod:roachprod_lib",
1269+
"//pkg/cmd/roachtest/cluster/mock:mockcluster",
12691270
"//pkg/cmd/roachtest/cluster:cluster",
12701271
"//pkg/cmd/roachtest/clusterstats:clusterstats",
12711272
"//pkg/cmd/roachtest/clusterstats:clusterstats_test",

pkg/cmd/roachtest/cluster.go

Lines changed: 62 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3188,31 +3188,20 @@ func (c *clusterImpl) MaybeExtendCluster(
31883188
// archForTest determines the CPU architecture to use for a test. If the test
31893189
// doesn't specify it, one is chosen randomly depending on flags.
31903190
func archForTest(ctx context.Context, l *logger.Logger, testSpec registry.TestSpec) vm.CPUArch {
3191-
if testSpec.Cluster.Arch != "" {
3192-
l.PrintfCtx(ctx, "Using specified arch=%q, %s", testSpec.Cluster.Arch, testSpec.Name)
3193-
return testSpec.Cluster.Arch
3194-
}
3195-
31963191
if roachtestflags.Cloud == spec.IBM {
31973192
// N.B. IBM only supports S390x on the "s390x" architecture.
31983193
l.PrintfCtx(ctx, "IBM Cloud: forcing arch=%q (only supported), %s", vm.ArchS390x, testSpec.Name)
31993194
return vm.ArchS390x
32003195
}
32013196

3202-
// CPU architecture is unspecified, choose one according to the
3203-
// probability distribution.
3204-
var arch vm.CPUArch
3205-
if prng.Float64() < roachtestflags.ARM64Probability {
3206-
arch = vm.ArchARM64
3207-
} else if prng.Float64() < roachtestflags.FIPSProbability {
3208-
// N.B. branch is taken with probability
3209-
// (1 - arm64Probability) * fipsProbability
3210-
// which is P(fips & amd64).
3211-
// N.B. FIPS is only supported on 'amd64' at this time.
3212-
arch = vm.ArchFIPS
3213-
} else {
3214-
arch = vm.ArchAMD64
3197+
validArchs := spec.AllArchs
3198+
if !testSpec.Cluster.CompatibleArchs.IsEmpty() {
3199+
l.PrintfCtx(ctx, "Selecting from architectures=%q, %s", testSpec.Cluster.CompatibleArchs.String(), testSpec.Name)
3200+
validArchs = testSpec.Cluster.CompatibleArchs
32153201
}
3202+
3203+
arch := randomArch(ctx, l, validArchs, prng, roachtestflags.ARM64Probability, roachtestflags.FIPSProbability)
3204+
32163205
if roachtestflags.Cloud == spec.GCE && arch == vm.ArchARM64 {
32173206
// N.B. T2A support is rather limited, both in terms of supported
32183207
// regions and no local SSDs. Thus, we must fall back to AMD64 in
@@ -3232,6 +3221,61 @@ func archForTest(ctx context.Context, l *logger.Logger, testSpec registry.TestSp
32323221
return arch
32333222
}
32343223

3224+
// randomArch chooses a random architecture, respecting the set of valid architectures
3225+
// specified by the test as well as the provided architecture probability flags.
3226+
func randomArch(
3227+
ctx context.Context,
3228+
l *logger.Logger,
3229+
validArchs spec.ArchSet,
3230+
prng *rand.Rand,
3231+
arm64Probability, fipsProbability float64,
3232+
) vm.CPUArch {
3233+
baseProbabilities := map[vm.CPUArch]float64{
3234+
vm.ArchAMD64: (1.0 - arm64Probability) * (1.0 - fipsProbability),
3235+
vm.ArchARM64: arm64Probability,
3236+
// N.B. FIPS is only supported on 'amd64' at this time:
3237+
// FIPS is taken with probability
3238+
// (1 - arm64Probability) * fipsProbability
3239+
// which is P(fips & amd64)
3240+
vm.ArchFIPS: (1.0 - arm64Probability) * fipsProbability,
3241+
}
3242+
3243+
// Calculate total weight for valid architectures only.
3244+
totalValidWeight := 0.0
3245+
validArchsList := validArchs.List()
3246+
for _, arch := range validArchsList {
3247+
totalValidWeight += baseProbabilities[arch]
3248+
}
3249+
3250+
// This would happen if the set of valid compatible arches (set by cluster spec) is disjoint with the set of
3251+
// enabled arches (set by roachtest flags).
3252+
if totalValidWeight == 0.0 {
3253+
l.PrintfCtx(ctx, "Defaulting to %s; CompatibleArches %s yields no architectures after applying roachtest arch probability flags", vm.ArchAMD64, validArchs.String())
3254+
return vm.ArchAMD64
3255+
}
3256+
3257+
// Since we allow only a subset of architectures, our total probability
3258+
// may not add up to 1. We normalize the weights amongst the valid architectures
3259+
// and track cumulative weights that give us "probability buckets" for each
3260+
// architecture.
3261+
cumulativeWeights := make([]float64, 0, len(validArchsList))
3262+
runningTotal := 0.0
3263+
for _, arch := range validArchsList {
3264+
normalizedWeight := baseProbabilities[arch] / totalValidWeight
3265+
runningTotal += normalizedWeight
3266+
cumulativeWeights = append(cumulativeWeights, runningTotal)
3267+
}
3268+
x := prng.Float64()
3269+
for i, weight := range cumulativeWeights {
3270+
if x < weight {
3271+
return validArchsList[i]
3272+
}
3273+
}
3274+
// Since we are adding floating point numbers, it's possible that we
3275+
// don't quite add up to 1.0. In that case, return the last architecture.
3276+
return validArchsList[len(validArchsList)-1]
3277+
}
3278+
32353279
// bucketVMsByProvider buckets cachedCluster.VMs by provider.
32363280
func bucketVMsByProvider(cachedCluster *cloud.Cluster) map[string][]vm.VM {
32373281
providerToVMs := make(map[string][]vm.VM)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
load("@io_bazel_rules_go//go:def.bzl", "go_library", "gomock")
2+
3+
gomock(
4+
name = "mock_cluster",
5+
testonly = True,
6+
out = "mock_cluster_generated.go",
7+
interfaces = ["Cluster"],
8+
library = "//pkg/cmd/roachtest/cluster",
9+
package = "mockcluster",
10+
self_package = "github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster/mock",
11+
visibility = [
12+
"//pkg/cmd/roachtest/clusterstats:__pkg__",
13+
"//pkg/cmd/roachtest/roachtestutil/mixedversion:__pkg__",
14+
"//pkg/gen:__pkg__",
15+
],
16+
)
17+
18+
go_library(
19+
name = "mockcluster",
20+
srcs = ["mock_cluster_generated.go"],
21+
importpath = "github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster/mock",
22+
visibility = [
23+
"//pkg/cmd/roachtest/clusterstats:__pkg__",
24+
"//pkg/cmd/roachtest/roachtestutil/mixedversion:__pkg__",
25+
"//pkg/gen:__pkg__",
26+
],
27+
deps = [
28+
"//pkg/cmd/roachprod/grafana",
29+
"//pkg/cmd/roachtest/cluster",
30+
"//pkg/cmd/roachtest/option",
31+
"//pkg/cmd/roachtest/spec",
32+
"//pkg/roachprod",
33+
"//pkg/roachprod/failureinjection/failures",
34+
"//pkg/roachprod/install",
35+
"//pkg/roachprod/logger",
36+
"//pkg/roachprod/prometheus",
37+
"//pkg/roachprod/vm",
38+
"@com_github_golang_mock//gomock",
39+
],
40+
)

pkg/cmd/roachtest/clusterstats/mock_cluster_generated_test.go renamed to pkg/cmd/roachtest/cluster/mock/mock_cluster_generated.go

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/cmd/roachtest/cluster_test.go

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -909,3 +909,119 @@ func TestVerifyLibraries(t *testing.T) {
909909
})
910910
}
911911
}
912+
913+
func TestRandomArchProbabilities(t *testing.T) {
914+
ctx := context.Background()
915+
916+
tests := []struct {
917+
validArchs spec.ArchSet
918+
arm64Probability float64
919+
fipsProbability float64
920+
expectedDistribution map[vm.CPUArch]float64
921+
}{
922+
{
923+
validArchs: spec.AllArchs,
924+
arm64Probability: 0.3,
925+
fipsProbability: 0.2,
926+
expectedDistribution: map[vm.CPUArch]float64{
927+
vm.ArchAMD64: 0.56,
928+
vm.ArchARM64: 0.3,
929+
vm.ArchFIPS: 0.14,
930+
},
931+
},
932+
{
933+
validArchs: spec.AllExceptFIPS,
934+
arm64Probability: 0.4,
935+
fipsProbability: 0.1,
936+
expectedDistribution: map[vm.CPUArch]float64{
937+
vm.ArchAMD64: 0.57447,
938+
vm.ArchARM64: 0.42553,
939+
},
940+
},
941+
{
942+
validArchs: spec.OnlyAMD64,
943+
arm64Probability: 0.5,
944+
fipsProbability: 0.3,
945+
expectedDistribution: map[vm.CPUArch]float64{
946+
vm.ArchAMD64: 1.0, // Only valid architecture
947+
},
948+
},
949+
{
950+
validArchs: spec.OnlyARM64,
951+
arm64Probability: 0.5,
952+
fipsProbability: 0.3,
953+
expectedDistribution: map[vm.CPUArch]float64{
954+
vm.ArchARM64: 1.0, // Only valid architecture
955+
},
956+
},
957+
{
958+
validArchs: spec.OnlyFIPS,
959+
arm64Probability: 0.2,
960+
fipsProbability: 0.0,
961+
expectedDistribution: map[vm.CPUArch]float64{
962+
vm.ArchAMD64: 1.0, // Should fall back to AMD64
963+
},
964+
},
965+
{
966+
validArchs: spec.AllExceptFIPS,
967+
arm64Probability: 0.0,
968+
fipsProbability: 1.0,
969+
expectedDistribution: map[vm.CPUArch]float64{
970+
vm.ArchAMD64: 1.0, // Should fall back to AMD64
971+
},
972+
},
973+
{
974+
validArchs: spec.AllExceptFIPS,
975+
arm64Probability: 0.5,
976+
fipsProbability: 1.0,
977+
expectedDistribution: map[vm.CPUArch]float64{
978+
vm.ArchARM64: 1.0,
979+
},
980+
},
981+
{
982+
validArchs: spec.Archs(spec.ArchAMD64, spec.ArchFIPS),
983+
arm64Probability: 0.3,
984+
fipsProbability: 0.4,
985+
expectedDistribution: map[vm.CPUArch]float64{
986+
vm.ArchAMD64: 0.6,
987+
vm.ArchFIPS: 0.4,
988+
},
989+
},
990+
{
991+
validArchs: spec.Archs(spec.ArchARM64, spec.ArchFIPS),
992+
arm64Probability: 0.6,
993+
fipsProbability: 0.2,
994+
expectedDistribution: map[vm.CPUArch]float64{
995+
vm.ArchARM64: 0.88235,
996+
vm.ArchFIPS: 0.11765,
997+
},
998+
},
999+
}
1000+
1001+
// Since this is a statistical test, we want to use a fixed seed to avoid flakes,
1002+
// i.e. a 99% confidence interval would be expected to fail when stressed 100 times.
1003+
//
1004+
// We can run this manually with a random seed for more confidence in our distribution:
1005+
// prng, _ := randutil.NewTestRand()
1006+
prng := rand.New(rand.NewSource(12345))
1007+
for _, test := range tests {
1008+
t.Run(fmt.Sprintf("%s/arm=%d%%/fips=%d%%", test.validArchs, int(test.arm64Probability*100), int(test.fipsProbability*100)), func(t *testing.T) {
1009+
const numSamples = 10000
1010+
counts := make(map[vm.CPUArch]int)
1011+
1012+
// Generate samples
1013+
for i := 0; i < numSamples; i++ {
1014+
arch := randomArch(ctx, nilLogger(), test.validArchs, prng, test.arm64Probability, test.fipsProbability)
1015+
counts[arch]++
1016+
}
1017+
1018+
for expectedArch, expectedProb := range test.expectedDistribution {
1019+
actualCount := float64(counts[expectedArch])
1020+
actualProb := actualCount / float64(numSamples)
1021+
1022+
tolerance := 0.02
1023+
require.InDelta(t, expectedProb, actualProb, tolerance)
1024+
}
1025+
})
1026+
}
1027+
}

pkg/cmd/roachtest/clusterstats/BUILD.bazel

Lines changed: 3 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ go_library(
1515
"//pkg/cmd/roachprod-microbench/util",
1616
# Required for generated mocks
1717
"//pkg/cmd/roachprod/grafana", #keep
18-
"//pkg/cmd/roachtest/cluster",
1918
"//pkg/cmd/roachtest/option",
2019
"//pkg/cmd/roachtest/test",
2120
# Required for generated mocks
@@ -36,6 +35,7 @@ go_library(
3635
"@com_github_prometheus_client_golang//api/prometheus/v1:prometheus",
3736
"@com_github_prometheus_common//model",
3837
"//pkg/cmd/roachtest/roachtestutil",
38+
"//pkg/cmd/roachtest/cluster",
3939
],
4040
)
4141

@@ -45,15 +45,15 @@ go_test(
4545
"exporter_test.go",
4646
"streamer_test.go",
4747
":mock_client", # keep
48-
":mock_cluster", # keep
4948
":mock_test", # keep
5049
],
5150
embed = [":clusterstats"],
5251
embedsrcs = ["openmetrics_expected.txt"],
5352
deps = [
5453
"//pkg/cmd/roachtest/cluster",
54+
"//pkg/cmd/roachtest/cluster/mock:mockcluster", #keep
5555
"//pkg/cmd/roachtest/registry",
56-
# Required for generated mocks
56+
"//pkg/cmd/roachtest/roachtestutil",
5757
"//pkg/cmd/roachtest/roachtestutil/task", #keep
5858
"//pkg/cmd/roachtest/spec",
5959
"//pkg/cmd/roachtest/test",
@@ -64,7 +64,6 @@ go_test(
6464
"@com_github_prometheus_client_golang//api/prometheus/v1:prometheus",
6565
"@com_github_prometheus_common//model",
6666
"@com_github_stretchr_testify//require",
67-
"//pkg/cmd/roachtest/roachtestutil",
6867
],
6968
)
7069

@@ -91,15 +90,3 @@ gomock(
9190
"//pkg/gen:__pkg__",
9291
],
9392
)
94-
95-
gomock(
96-
name = "mock_cluster",
97-
out = "mock_cluster_generated_test.go",
98-
interfaces = ["Cluster"],
99-
library = "//pkg/cmd/roachtest/cluster",
100-
package = "clusterstats",
101-
visibility = [
102-
":__pkg__",
103-
"//pkg/gen:__pkg__",
104-
],
105-
)

0 commit comments

Comments
 (0)