Skip to content

Commit 1eb6d7a

Browse files
edsavagecursoragent
andcommitted
[ML] Tune test parallelism: fix low-core and improve mid-core machines
Two changes to the test parallelism formula: 1. On <=4 cores (macOS CI Orka VMs), use numCpus-1 instead of numCpus. Full CPU saturation caused CKMostCorrelatedTest/testScale to fail because wall-clock complexity assertions became unreliable. 2. On >4 cores, use ceil(numCpus/2) instead of ceil(numCpus/3). The /3 divisor was too conservative on 8-core Linux aarch64 CI (52 min with -j 3 vs 39.5 min with -j 8). The /2 divisor gives -j 4 on 8 cores — a better balance of parallelism vs contention. Also adds diagnostic logging of CPU count and parallelism settings. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent 1c0ee04 commit 1eb6d7a

File tree

2 files changed

+12
-7
lines changed

2 files changed

+12
-7
lines changed

build.gradle

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -116,9 +116,12 @@ project.ext.numCpus = Runtime.runtime.availableProcessors()
116116
// Each suite spawns ctest --parallel <all_cpus> internally, so running too
117117
// many suites simultaneously causes resource contention. These values were
118118
// determined empirically (see PR #2900).
119-
// On low-core machines (<=4, e.g. macOS CI Orka VMs), use all cores since
120-
// CTest internal parallelism is modest enough that contention is minimal.
121-
project.ext.testParallel = isWindows ? 2 : (numCpus <= 4 ? numCpus : Math.max(2, (int) Math.ceil(numCpus / 3.0)))
119+
// On low-core machines (<=4, e.g. macOS CI Orka VMs with 4 cores),
120+
// cap at numCpus-1 to leave headroom for timing-sensitive tests
121+
// (e.g. CKMostCorrelatedTest/testScale). For higher core counts,
122+
// ceil(numCpus/2) balances parallelism vs contention — ceil(numCpus/3)
123+
// was too conservative on 8-core machines (52 min vs 39.5 min).
124+
project.ext.testParallel = isWindows ? 2 : (numCpus <= 4 ? Math.max(2, numCpus - 1) : Math.max(2, (int) Math.ceil(numCpus / 2.0)))
122125
project.ext.makeEnvironment = [ 'CPP_CROSS_COMPILE': cppCrossCompile,
123126
'VERSION_QUALIFIER': versionQualifier,
124127
'SNAPSHOT': (isSnapshot ? 'yes' : 'no'),

dev-tools/docker/docker_entrypoint.sh

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,15 @@ if [ "x$1" = "x--test" ] ; then
6868
echo passed > build/test_status.txt
6969
# Each test suite spawns ctest --parallel <nproc> internally, so limit
7070
# the number of suites running concurrently to avoid resource contention.
71-
# On low-core machines (<=4), use all cores since CTest internal
72-
# parallelism is modest enough that contention is minimal.
71+
# On low-core machines (<=4), cap at nproc-1 to leave headroom for
72+
# timing-sensitive tests (e.g. CKMostCorrelatedTest/testScale).
73+
# For higher core counts, ceil(nproc/2) balances parallelism vs
74+
# contention — ceil(nproc/3) was too conservative on 8-core machines.
7375
NCPUS=$(nproc)
7476
if [ "$NCPUS" -le 4 ]; then
75-
TEST_PARALLEL=$NCPUS
77+
TEST_PARALLEL=$(( NCPUS > 2 ? NCPUS - 1 : 2 ))
7678
else
77-
TEST_PARALLEL=$(( (NCPUS + 2) / 3 ))
79+
TEST_PARALLEL=$(( (NCPUS + 1) / 2 ))
7880
fi
7981
echo "Test parallelism: nproc=${NCPUS}, TEST_PARALLEL=${TEST_PARALLEL} (cmake --build -j ${TEST_PARALLEL})"
8082
cmake --build cmake-build-docker ${CMAKE_VERBOSE} -j ${TEST_PARALLEL} -t test_individually || echo failed > build/test_status.txt

0 commit comments

Comments
 (0)