Skip to content

build: Optimize CI with test splitting and 32-core runner#16691

Open
kgpai wants to merge 1 commit intofacebookincubator:mainfrom
kgpai:build/optimize-ci-split-tests
Open

build: Optimize CI with test splitting and 32-core runner#16691
kgpai wants to merge 1 commit intofacebookincubator:mainfrom
kgpai:build/optimize-ci-split-tests

Conversation

@kgpai
Copy link
Contributor

@kgpai kgpai commented Mar 9, 2026

Summary

  • Split 5 monolithic test targets into 139 individual test binaries for better ctest parallelism
  • Upgrade ubuntu-debug job from 16-core-ubuntu to 32-core-ubuntu with tuned parallelism settings
  • Add -DVELOX_BUILD_SHARED=ON to the adapters job for faster shared linking
  • Add ulimit -n 65536 and --timeout 900 to prevent test infra failures

Test target splits (CMakeLists.txt only, no source changes)

Original Target Individual Tests
velox_exec_test 76
velox_exec_util_test 9
velox_aggregates_test 43
velox_cache_test 5
velox_serializer_test 5

CI tuning changes (ubuntu-debug)

Setting Before After Rationale
Runner 16-core-ubuntu 32-core-ubuntu 2x cores, 2x RAM (128 GB)
NUM_THREADS 16 32 Match core count
MAX_HIGH_MEM_JOBS 4 12 128 GB / 6.8 GB per heavy TU = 18; use 12
MAX_LINK_JOBS 2 12 Shared links use ~1 GB each; 12 is safe
ctest -j 8 24 3/4 core count; leaves headroom
Test timeout 600s (default) 900s SpillerTest needs 12m+, split tests need more time individually

Validated results (cold build, no ccache, 32-thread simulation)

Metric Value
Build time 26m 34s
Test time 17m 45s
Total 44m 28s
Tests 631 total, 625 passed (99%)

With ccache (typical CI run), expected total is 25-35 min vs ~55 min current.

Cost per run increases from $2.94 to ~$3.69 (+26%), justified by 25-35 min savings per PR iteration.

Test plan

  • CI passes on this PR (validates test splits compile and link correctly)
  • ubuntu-debug job completes faster than baseline (~55 min -> ~35 min with ccache)
  • All 631 test targets discovered and executed
  • No new test failures introduced by the splits (failures are pre-existing)
  • Adapters job builds successfully with shared linking

@netlify
Copy link

netlify bot commented Mar 9, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5e00f5b
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/69b34f0d2c9edd0008d969fb

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2026
@kgpai kgpai force-pushed the build/optimize-ci-split-tests branch from 32eab85 to 98fe7f6 Compare March 9, 2026 23:50
@meta-codesync
Copy link

meta-codesync bot commented Mar 9, 2026

@kgpai has imported this pull request. If you are a Meta employee, you can view this in D95874902.

@kgpai kgpai requested a review from czentgr March 9, 2026 23:54
@kgpai
Copy link
Contributor Author

kgpai commented Mar 9, 2026

@czentgr @pratikpugalia @majetideepak PTAL.

This should reduce CI time by 20-30 minutes at cost of 75c roughly / run.
We can look at optimizing costs after getting run time down.

@kgpai kgpai force-pushed the build/optimize-ci-split-tests branch from 98fe7f6 to fdfbe6d Compare March 10, 2026 00:04
@czentgr
Copy link
Collaborator

czentgr commented Mar 11, 2026

@kgpai Thanks, this looks good. A lot more executables so there is some tradeoff in generating them vs one. But certainly better overall to reduce the runtime later for more parallelism.
As noted, turning on SHARED_BUILD for the adapters image causes the SEGV in the velox_functions_remote_client_test.

We probably later can experiment and increase the MAX_LIINK_JOBS to 16. On the release build without extra symbols we should be ok on that. We might be able to increase this further still.

@kgpai kgpai force-pushed the build/optimize-ci-split-tests branch 2 times, most recently from 158730b to 24bc35f Compare March 12, 2026 22:20
@facebookincubator facebookincubator deleted a comment from claude bot Mar 12, 2026
Split 5 monolithic test targets into 139 individual test binaries for
better ctest parallelism, and upgrade both ubuntu-debug and adapters
jobs to 32-core runners with tuned build settings.

Test target splits (CMakeLists.txt changes only):
- velox_exec_test -> 76 individual tests
- velox_exec_util_test -> 9 individual tests
- velox_aggregates_test -> 43 individual tests
- velox_cache_test -> 5 individual tests
- velox_serializer_test -> 5 individual tests

CI workflow changes (ubuntu-debug + adapters):
- Runner: 16-core/8-core -> 32-core (128 GB RAM)
- NUM_THREADS: 16/8 -> 32
- MAX_HIGH_MEM_JOBS: 4 -> 12 (128 GB / 6.8 GB per heavy TU)
- MAX_LINK_JOBS: 2/4 -> 12 (shared links use ~1 GB each)
- ctest: -j 8 -> -j 24 --timeout 900
- ulimit -n 65536 to prevent "Too many open files"
- Adapters: add -DVELOX_BUILD_SHARED=ON for faster linking

The test splits add ~135 new link targets. On the previous 8-core
runner (32 GB RAM), these would OOM during concurrent compilation
of heavy TUs (up to 6.8 GB each) + parallel linking.

Validated on 48-core test machine (throttled to 32 threads):
- Build: 26m 34s (cold, no ccache)
- Test: 17m 45s (631 tests, 625 passed)
- Total: 44m 28s vs estimated ~70 min current CI
@pratikpugalia
Copy link
Contributor

@claude /pr-review

@claude
Copy link

claude bot commented Mar 12, 2026

Claude finished @pratikpugalia's task in 3m 30s —— View job


Summary

This PR splits 5 monolithic test targets into ~139 individual binaries for improved ctest parallelism, upgrades both the adapters and ubuntu-debug jobs to 32-core runners, and adds ulimit/timeout hardening. The CMake refactoring pattern (foreach loop over source lists) is clean and idiomatic. The approach is sound and should meaningfully reduce CI wall-clock time.

Issues Found

🟡 Suggestion: Missing PRIVATE keyword in target_link_libraries (5 locations)

The velox_cache_* tests correctly use PRIVATE in target_link_libraries, but the other split targets do not. For executables this is less impactful than for libraries, but it's best practice to use PRIVATE consistently to prevent unintended transitive dependency exposure:

  • velox/exec/tests/CMakeLists.txt:168velox_exec_* loop
  • velox/exec/tests/CMakeLists.txt:178velox_exec_SpatialJoinTest
  • velox/exec/tests/CMakeLists.txt:275velox_exec_* util loop
  • velox/functions/prestosql/aggregates/tests/CMakeLists.txt:92velox_aggregates_* loop
  • velox/serializers/tests/CMakeLists.txt:62velox_serializer_* loop

(Inline comments with suggested fixes have been posted on each.)

🟡 Suggestion: PR description under-reports adapters job changes

The PR description tables focus on ubuntu-debug, but the adapters job also received significant changes:

Setting Before After
Runner 8-core-ubuntu-22.04 32-core-ubuntu
NUM_THREADS 8 32
MAX_HIGH_MEM_JOBS 4 12
MAX_LINK_JOBS 4 12
ctest -j 8 24
VELOX_BUILD_SHARED not set ON

This is a 4x runner upgrade (not 2x like ubuntu-debug) and changes the OS label from ubuntu-22.04 to generic ubuntu. The cost implications for the adapters job are not documented. Consider updating the description with a table for adapters changes as well, since this is a larger jump than ubuntu-debug.

🟡 Suggestion: velox_exec_test timeout regression risk

The old velox_exec_test had a custom TIMEOUT 6000 (100 minutes) via set_tests_properties, with a comment referencing issue #13879. This has been replaced by a global --timeout 900 (15 minutes). While individual split tests should run faster, the PR description notes SpillerTest needs 12+ minutes. If SpillerTest approaches that limit, the 900s timeout could be tight. Consider:

  • Verifying velox_exec_SpillerTest completes well within 900s, or
  • Adding a per-test timeout override for it if needed

🟡 Suggestion: VELOX_GFLAGS_URL bypasses SHA256 verification

The new VELOX_GFLAGS_URL: file:///velox/deps-sources/gflags-v2.2.2.tar.gz (line 296) causes velox_resolve_dependency_url to clear the SHA256 checksum (see CMake/ResolveDependency.cmake:132). This is acceptable for CI using a pre-bundled file in the container image, but worth noting — the gflags tarball will not be integrity-verified during the build. Adding VELOX_GFLAGS_SHA256 with the known checksum would restore verification.

🟢 Nitpick: OS label specificity

The adapters runner changed from 8-core-ubuntu-22.04 (pinned OS version) to 32-core-ubuntu (latest). This means the OS version could change across runs. This may be intentional, but differs from the original explicit pinning.

Positive Observations

  • The foreach-based CMake pattern is clean, maintainable, and easy to extend when new test files are added.
  • Correctly handling conditional tests (SpatialJoinTest, GeometryAggregateTest) outside the loop with their extra dependencies.
  • Including Main.cpp as a source file in each split binary (for exec/aggregates tests) is the right approach, avoiding a separate static library.
  • The PrestoQueryRunnerIntermediateTypeTransformTestBase.cpp base class inclusion in the util test loop is well-handled.
  • The ulimit -n 65536 addition is a good defensive measure for running many parallel test processes.
  • No test target name collisions between the velox_exec_* (main) and velox_exec_* (util) namespaces — verified that all util tests have PrestoQueryRunner* prefixed names.

CI is currently in progress: Linux Build using GCC |

set(TEST_TARGET "velox_exec_${TEST_NAME}")
add_executable(${TEST_TARGET} ${TEST_SOURCE} Main.cpp)
add_test(NAME ${TEST_TARGET} COMMAND ${TEST_TARGET} WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_TEST_DEPS})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PRIVATE keyword is missing here. Other split targets (e.g., velox_cache_* in velox/common/caching/tests/CMakeLists.txt:51) correctly use PRIVATE. For test executables, PRIVATE is best practice to prevent unintended transitive dependency leakage.

Suggested change
target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_TEST_DEPS})
target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_EXEC_TEST_DEPS})

set(TEST_TARGET "velox_aggregates_${TEST_NAME}")
add_executable(${TEST_TARGET} ${TEST_SOURCE} Main.cpp)
add_test(NAME ${TEST_TARGET} COMMAND ${TEST_TARGET} WORKING_DIRECTORY .)
target_link_libraries(${TEST_TARGET} ${VELOX_AGGREGATES_TEST_DEPS})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as exec tests: missing PRIVATE keyword for consistency and correctness.

Suggested change
target_link_libraries(${TEST_TARGET} ${VELOX_AGGREGATES_TEST_DEPS})
target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_AGGREGATES_TEST_DEPS})

set(TEST_TARGET "velox_serializer_${TEST_NAME}")
add_executable(${TEST_TARGET} ${TEST_SOURCE})
add_test(${TEST_TARGET} ${TEST_TARGET})
target_link_libraries(${TEST_TARGET} ${VELOX_SERIALIZER_TEST_DEPS})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing PRIVATE keyword, for consistency with velox_cache_* tests and CMake best practices.

Suggested change
target_link_libraries(${TEST_TARGET} ${VELOX_SERIALIZER_TEST_DEPS})
target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_SERIALIZER_TEST_DEPS})

Main.cpp
)
add_test(NAME ${TEST_TARGET} COMMAND ${TEST_TARGET})
target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_UTIL_TEST_DEPS})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing PRIVATE keyword here as well.

Suggested change
target_link_libraries(${TEST_TARGET} ${VELOX_EXEC_UTIL_TEST_DEPS})
target_link_libraries(${TEST_TARGET} PRIVATE ${VELOX_EXEC_UTIL_TEST_DEPS})

COMMAND velox_exec_SpatialJoinTest
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
)
target_link_libraries(velox_exec_SpatialJoinTest ${VELOX_EXEC_TEST_DEPS})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRIVATE keyword missing here as well, for consistency:

Suggested change
target_link_libraries(velox_exec_SpatialJoinTest ${VELOX_EXEC_TEST_DEPS})
target_link_libraries(velox_exec_SpatialJoinTest PRIVATE ${VELOX_EXEC_TEST_DEPS})

@kgpai kgpai force-pushed the build/optimize-ci-split-tests branch from 24bc35f to 5e00f5b Compare March 12, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants