build: Optimize CI with test splitting and 32-core runner by kgpai · Pull Request #16691 · facebookincubator/velox

kgpai · 2026-03-09T23:50:13Z

Summary

Split 5 monolithic test targets into 139 individual test binaries for better ctest parallelism
Upgrade ubuntu-debug job from 16-core-ubuntu to 32-core-ubuntu with tuned parallelism settings
Add -DVELOX_BUILD_SHARED=ON to the adapters job for faster shared linking
Add ulimit -n 65536 and --timeout 900 to prevent test infra failures

Test target splits (CMakeLists.txt only, no source changes)

Original Target	Individual Tests
`velox_exec_test`	76
`velox_exec_util_test`	9
`velox_aggregates_test`	43
`velox_cache_test`	5
`velox_serializer_test`	5

CI tuning changes (`ubuntu-debug`)

Setting	Before	After	Rationale
Runner	`16-core-ubuntu`	`32-core-ubuntu`	2x cores, 2x RAM (128 GB)
`NUM_THREADS`	16	32	Match core count
`MAX_HIGH_MEM_JOBS`	4	12	128 GB / 6.8 GB per heavy TU = 18; use 12
`MAX_LINK_JOBS`	2	12	Shared links use ~1 GB each; 12 is safe
`ctest -j`	8	24	3/4 core count; leaves headroom
Test timeout	600s (default)	900s	SpillerTest needs 12m+, split tests need more time individually

Validated results (cold build, no ccache, 32-thread simulation)

Metric	Value
Build time	26m 34s
Test time	17m 45s
Total	44m 28s
Tests	631 total, 625 passed (99%)

With ccache (typical CI run), expected total is 25-35 min vs ~55 min current.

Cost per run increases from $2.94 to ~$3.69 (+26%), justified by 25-35 min savings per PR iteration.

Test plan

CI passes on this PR (validates test splits compile and link correctly)
ubuntu-debug job completes faster than baseline (~55 min -> ~35 min with ccache)
All 631 test targets discovered and executed
No new test failures introduced by the splits (failures are pre-existing)
Adapters job builds successfully with shared linking

netlify · 2026-03-09T23:50:19Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`5e00f5b`
🔍 Latest deploy log	https://app.netlify.com/projects/meta-velox/deploys/69b34f0d2c9edd0008d969fb

meta-codesync · 2026-03-09T23:53:51Z

@kgpai has imported this pull request. If you are a Meta employee, you can view this in D95874902.

kgpai · 2026-03-09T23:56:04Z

@czentgr @pratikpugalia @majetideepak PTAL.

This should reduce CI time by 20-30 minutes at cost of 75c roughly / run.
We can look at optimizing costs after getting run time down.

czentgr · 2026-03-11T00:19:06Z

@kgpai Thanks, this looks good. A lot more executables so there is some tradeoff in generating them vs one. But certainly better overall to reduce the runtime later for more parallelism.
As noted, turning on SHARED_BUILD for the adapters image causes the SEGV in the velox_functions_remote_client_test.

We probably later can experiment and increase the MAX_LIINK_JOBS to 16. On the release build without extra symbols we should be ok on that. We might be able to increase this further still.

Split 5 monolithic test targets into 139 individual test binaries for better ctest parallelism, and upgrade both ubuntu-debug and adapters jobs to 32-core runners with tuned build settings. Test target splits (CMakeLists.txt changes only): - velox_exec_test -> 76 individual tests - velox_exec_util_test -> 9 individual tests - velox_aggregates_test -> 43 individual tests - velox_cache_test -> 5 individual tests - velox_serializer_test -> 5 individual tests CI workflow changes (ubuntu-debug + adapters): - Runner: 16-core/8-core -> 32-core (128 GB RAM) - NUM_THREADS: 16/8 -> 32 - MAX_HIGH_MEM_JOBS: 4 -> 12 (128 GB / 6.8 GB per heavy TU) - MAX_LINK_JOBS: 2/4 -> 12 (shared links use ~1 GB each) - ctest: -j 8 -> -j 24 --timeout 900 - ulimit -n 65536 to prevent "Too many open files" - Adapters: add -DVELOX_BUILD_SHARED=ON for faster linking The test splits add ~135 new link targets. On the previous 8-core runner (32 GB RAM), these would OOM during concurrent compilation of heavy TUs (up to 6.8 GB each) + parallel linking. Validated on 48-core test machine (throttled to 32 threads): - Build: 26m 34s (cold, no ccache) - Test: 17m 45s (631 tests, 625 passed) - Total: 44m 28s vs estimated ~70 min current CI

pratikpugalia · 2026-03-12T23:36:10Z

@claude /pr-review

claude · 2026-03-12T23:36:27Z

Claude finished @pratikpugalia's task in 3m 30s —— View job

Summary

This PR splits 5 monolithic test targets into ~139 individual binaries for improved ctest parallelism, upgrades both the adapters and ubuntu-debug jobs to 32-core runners, and adds ulimit/timeout hardening. The CMake refactoring pattern (foreach loop over source lists) is clean and idiomatic. The approach is sound and should meaningfully reduce CI wall-clock time.

Issues Found

🟡 Suggestion: Missing `PRIVATE` keyword in `target_link_libraries` (5 locations)

The velox_cache_* tests correctly use PRIVATE in target_link_libraries, but the other split targets do not. For executables this is less impactful than for libraries, but it's best practice to use PRIVATE consistently to prevent unintended transitive dependency exposure:

velox/exec/tests/CMakeLists.txt:168 — velox_exec_* loop
velox/exec/tests/CMakeLists.txt:178 — velox_exec_SpatialJoinTest
velox/exec/tests/CMakeLists.txt:275 — velox_exec_* util loop
velox/functions/prestosql/aggregates/tests/CMakeLists.txt:92 — velox_aggregates_* loop
velox/serializers/tests/CMakeLists.txt:62 — velox_serializer_* loop

(Inline comments with suggested fixes have been posted on each.)

🟡 Suggestion: PR description under-reports adapters job changes

The PR description tables focus on ubuntu-debug, but the adapters job also received significant changes:

Setting	Before	After
Runner	`8-core-ubuntu-22.04`	`32-core-ubuntu`
`NUM_THREADS`	8	32
`MAX_HIGH_MEM_JOBS`	4	12
`MAX_LINK_JOBS`	4	12
`ctest -j`	8	24
`VELOX_BUILD_SHARED`	not set	`ON`

This is a 4x runner upgrade (not 2x like ubuntu-debug) and changes the OS label from ubuntu-22.04 to generic ubuntu. The cost implications for the adapters job are not documented. Consider updating the description with a table for adapters changes as well, since this is a larger jump than ubuntu-debug.

🟡 Suggestion: `velox_exec_test` timeout regression risk

The old velox_exec_test had a custom TIMEOUT 6000 (100 minutes) via set_tests_properties, with a comment referencing issue #13879. This has been replaced by a global --timeout 900 (15 minutes). While individual split tests should run faster, the PR description notes SpillerTest needs 12+ minutes. If SpillerTest approaches that limit, the 900s timeout could be tight. Consider:

Verifying velox_exec_SpillerTest completes well within 900s, or
Adding a per-test timeout override for it if needed

🟡 Suggestion: `VELOX_GFLAGS_URL` bypasses SHA256 verification

The new VELOX_GFLAGS_URL: file:///velox/deps-sources/gflags-v2.2.2.tar.gz (line 296) causes velox_resolve_dependency_url to clear the SHA256 checksum (see CMake/ResolveDependency.cmake:132). This is acceptable for CI using a pre-bundled file in the container image, but worth noting — the gflags tarball will not be integrity-verified during the build. Adding VELOX_GFLAGS_SHA256 with the known checksum would restore verification.

🟢 Nitpick: OS label specificity

The adapters runner changed from 8-core-ubuntu-22.04 (pinned OS version) to 32-core-ubuntu (latest). This means the OS version could change across runs. This may be intentional, but differs from the original explicit pinning.

Positive Observations

The foreach-based CMake pattern is clean, maintainable, and easy to extend when new test files are added.
Correctly handling conditional tests (SpatialJoinTest, GeometryAggregateTest) outside the loop with their extra dependencies.
Including Main.cpp as a source file in each split binary (for exec/aggregates tests) is the right approach, avoiding a separate static library.
The PrestoQueryRunnerIntermediateTypeTransformTestBase.cpp base class inclusion in the util test loop is well-handled.
The ulimit -n 65536 addition is a good defensive measure for running many parallel test processes.
No test target name collisions between the velox_exec_* (main) and velox_exec_* (util) namespaces — verified that all util tests have PrestoQueryRunner* prefixed names.

CI is currently in progress: Linux Build using GCC |

claude · 2026-03-12T23:39:22Z

velox/exec/tests/CMakeLists.txt

+  set(TEST_TARGET "velox_exec_${TEST_NAME}")
+  add_executable(${TEST_TARGET} ${TEST_SOURCE} Main.cpp)
+  add_test(NAME ${TEST_TARGET} COMMAND ${TEST_TARGET} WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
+  target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_TEST_DEPS})


The PRIVATE keyword is missing here. Other split targets (e.g., velox_cache_* in velox/common/caching/tests/CMakeLists.txt:51) correctly use PRIVATE. For test executables, PRIVATE is best practice to prevent unintended transitive dependency leakage.

Suggested change

target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_TEST_DEPS})

target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_EXEC_TEST_DEPS})

claude · 2026-03-12T23:39:24Z

velox/functions/prestosql/aggregates/tests/CMakeLists.txt

+  set(TEST_TARGET "velox_aggregates_${TEST_NAME}")
+  add_executable(${TEST_TARGET} ${TEST_SOURCE} Main.cpp)
+  add_test(NAME ${TEST_TARGET} COMMAND ${TEST_TARGET} WORKING_DIRECTORY .)
+  target_link_libraries(${TEST_TARGET} ${VELOX_AGGREGATES_TEST_DEPS})


Same as exec tests: missing PRIVATE keyword for consistency and correctness.

Suggested change

target_link_libraries(${TEST_TARGET} ${VELOX_AGGREGATES_TEST_DEPS})

target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_AGGREGATES_TEST_DEPS})

claude · 2026-03-12T23:39:26Z

velox/serializers/tests/CMakeLists.txt

+  set(TEST_TARGET "velox_serializer_${TEST_NAME}")
+  add_executable(${TEST_TARGET} ${TEST_SOURCE})
+  add_test(${TEST_TARGET} ${TEST_TARGET})
+  target_link_libraries(${TEST_TARGET} ${VELOX_SERIALIZER_TEST_DEPS})


Missing PRIVATE keyword, for consistency with velox_cache_* tests and CMake best practices.

Suggested change

target_link_libraries(${TEST_TARGET} ${VELOX_SERIALIZER_TEST_DEPS})

target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_SERIALIZER_TEST_DEPS})

claude · 2026-03-12T23:39:27Z

velox/exec/tests/CMakeLists.txt

+    Main.cpp
+  )
+  add_test(NAME ${TEST_TARGET} COMMAND ${TEST_TARGET})
+  target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_UTIL_TEST_DEPS})


Missing PRIVATE keyword here as well.

Suggested change

target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_UTIL_TEST_DEPS})

target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_EXEC_UTIL_TEST_DEPS})

claude · 2026-03-12T23:39:36Z

velox/exec/tests/CMakeLists.txt

+    COMMAND velox_exec_SpatialJoinTest
+    WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
+  )
+  target_link_libraries(velox_exec_SpatialJoinTest ${VELOX_EXEC_TEST_DEPS})


PRIVATE keyword missing here as well, for consistency:

Suggested change

target_link_libraries(velox_exec_SpatialJoinTest ${VELOX_EXEC_TEST_DEPS})

target_link_libraries(velox_exec_SpatialJoinTest PRIVATE ${VELOX_EXEC_TEST_DEPS})

kgpai requested review from assignUser and majetideepak as code owners March 9, 2026 23:50

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2026

kgpai force-pushed the build/optimize-ci-split-tests branch from 32eab85 to 98fe7f6 Compare March 9, 2026 23:50

kgpai requested a review from czentgr March 9, 2026 23:54

kgpai force-pushed the build/optimize-ci-split-tests branch from 98fe7f6 to fdfbe6d Compare March 10, 2026 00:04

kgpai force-pushed the build/optimize-ci-split-tests branch 2 times, most recently from 158730b to 24bc35f Compare March 12, 2026 22:20

facebookincubator deleted a comment from claude bot Mar 12, 2026

claude bot reviewed Mar 12, 2026

View reviewed changes

kgpai force-pushed the build/optimize-ci-split-tests branch from 24bc35f to 5e00f5b Compare March 12, 2026 23:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: Optimize CI with test splitting and 32-core runner#16691

build: Optimize CI with test splitting and 32-core runner#16691
kgpai wants to merge 1 commit intofacebookincubator:mainfrom
kgpai:build/optimize-ci-split-tests

kgpai commented Mar 9, 2026 •

edited

Loading

Uh oh!

netlify bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

kgpai commented Mar 9, 2026

Uh oh!

czentgr commented Mar 11, 2026 •

edited

Loading

Uh oh!

pratikpugalia commented Mar 12, 2026

Uh oh!

claude bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

claude bot Mar 12, 2026

Uh oh!

claude bot Mar 12, 2026

Uh oh!

claude bot Mar 12, 2026

Uh oh!

claude bot Mar 12, 2026

Uh oh!

claude bot Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_TEST_DEPS})
	target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_EXEC_TEST_DEPS})

	target_link_libraries(velox_exec_SpatialJoinTest ${VELOX_EXEC_TEST_DEPS})
	target_link_libraries(velox_exec_SpatialJoinTest PRIVATE ${VELOX_EXEC_TEST_DEPS})

Conversation

kgpai commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test target splits (CMakeLists.txt only, no source changes)

CI tuning changes (ubuntu-debug)

Validated results (cold build, no ccache, 32-thread simulation)

Test plan

Uh oh!

netlify bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

kgpai commented Mar 9, 2026

Uh oh!

czentgr commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratikpugalia commented Mar 12, 2026

Uh oh!

claude bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issues Found

🟡 Suggestion: Missing PRIVATE keyword in target_link_libraries (5 locations)

🟡 Suggestion: PR description under-reports adapters job changes

🟡 Suggestion: velox_exec_test timeout regression risk

🟡 Suggestion: VELOX_GFLAGS_URL bypasses SHA256 verification

🟢 Nitpick: OS label specificity

Positive Observations

Uh oh!

claude bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kgpai commented Mar 9, 2026 •

edited

Loading

CI tuning changes (`ubuntu-debug`)

netlify bot commented Mar 9, 2026 •

edited

Loading

czentgr commented Mar 11, 2026 •

edited

Loading

claude bot commented Mar 12, 2026 •

edited

Loading

🟡 Suggestion: Missing `PRIVATE` keyword in `target_link_libraries` (5 locations)

🟡 Suggestion: `velox_exec_test` timeout regression risk

🟡 Suggestion: `VELOX_GFLAGS_URL` bypasses SHA256 verification