fix(build): Harden the merge to main builds#16424
fix(build): Harden the merge to main builds#16424kgpai wants to merge 12 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
/claude-review |
Summary: The merge to main job with resolve_dependencies fails occasionaly since it tries to get all dependencies from the internet . As we are trying to make the builds more robust and green - this change sets the job to use SYSTEM dependency. We will still test the resolve dependency module out but run it via our scheduled jobs. Differential Revision: D93500381
4480c8a to
3f14e1f
Compare
Summary: Pull Request resolved: facebookincubator#16424 The merge to main job with resolve_dependencies fails occasionaly since it tries to get all dependencies from the internet . As we are trying to make the builds more robust and green - this change sets the job to use SYSTEM dependency. We will still test the resolve dependency module out but run it via our scheduled jobs. Differential Revision: D93500381
3f14e1f to
9bc6f5d
Compare
Summary: The merge to main job with resolve_dependencies fails occasionaly since it tries to get all dependencies from the internet . As we are trying to make the builds more robust and green - this change sets the job to use SYSTEM dependency. We will still test the resolve dependency module out but run it via our scheduled jobs. Differential Revision: D93500381
9bc6f5d to
ef85145
Compare
|
/claude-review |
1 similar comment
|
/claude-review |
✅ Claude Code ReviewRequested by @kgpai Now let me understand what changed in the diff better. Based on the code analysis, I can see that:
SummaryThis is a well-reasoned build system improvement that switches the merge-to-main builds from bundled dependencies (building from source) to system dependencies (using pre-installed packages) to improve build stability. The change maintains test coverage by adding bundled dependency testing to scheduled jobs. Issues FoundNone. The changes are appropriate and well-structured. Positive Observations✅ Good Build Strategy: The change addresses the stated problem of occasional build failures when fetching dependencies from the internet by switching to more reliable system dependencies for critical merge builds. ✅ Maintains Test Coverage: The PR thoughtfully preserves bundled dependency testing by moving it to scheduled jobs rather than eliminating it entirely. ✅ Consistent Structure: The new scheduled job follows the same pattern as existing jobs, maintaining consistency in the CI pipeline structure. ✅ Appropriate Environment Variables: Both ✅ Proper Conditional Guard: The new scheduled job includes the proper repository check ✅ Complete Job Definition: The new job includes all necessary steps including ccache management, dependency installation, build process, and testing. The changes align well with the goal of making builds more robust and reducing CI failures while maintaining comprehensive dependency testing coverage. ℹ️ About this reviewThis review was generated by Claude Code. Limitations:
To request another review, comment |
Summary: The merge to main job with resolve_dependencies fails occasionaly since it tries to get all dependencies from the internet . As we are trying to make the builds more robust and green - this change sets the job to use SYSTEM dependency. We will still test the resolve dependency module out but run it via our scheduled jobs. Differential Revision: D93500381
4f9f688 to
b27ab3e
Compare
Summary: - Extract standalone `install_grpc()` function from `install_gcs_sdk_cpp()` in `setup-common.sh` and refactor `install_gcs_sdk_cpp` to reuse it - Add `install_grpc` to `setup-ubuntu.sh`'s `install_velox_deps` so gRPC is built from source and available as a SYSTEM dependency in the container - Install `gh` and `jq` in the ubuntu-22.04 Dockerfile, required by `apache/infrastructure-actions/stash` CI action This enables using `VELOX_DEPENDENCY_SOURCE=SYSTEM` with `VELOX_BUILD_TESTING=ON` in the ubuntu-22.04 container without needing to override `gRPC_SOURCE=BUNDLED` or install CI tools at runtime. Pull Request resolved: #16701 Test Plan: - [ ] Verify the Docker image builds successfully (triggered by the `scripts/` path filter) - [ ] Verify the `install_gcs_sdk_cpp` refactor doesn't break GCS adapter builds - [ ] Once image is published, update PR #16424 to remove `gRPC_SOURCE: BUNDLED` override and runtime `gh`/`jq` installation Reviewed By: pratikpugalia Differential Revision: D95972509 Pulled By: kgpai fbshipit-source-id: c6b9f99ed221c97b8977351f3b859682be5a8198
…incubator#16701) Summary: - Extract standalone `install_grpc()` function from `install_gcs_sdk_cpp()` in `setup-common.sh` and refactor `install_gcs_sdk_cpp` to reuse it - Add `install_grpc` to `setup-ubuntu.sh`'s `install_velox_deps` so gRPC is built from source and available as a SYSTEM dependency in the container - Install `gh` and `jq` in the ubuntu-22.04 Dockerfile, required by `apache/infrastructure-actions/stash` CI action This enables using `VELOX_DEPENDENCY_SOURCE=SYSTEM` with `VELOX_BUILD_TESTING=ON` in the ubuntu-22.04 container without needing to override `gRPC_SOURCE=BUNDLED` or install CI tools at runtime. Pull Request resolved: facebookincubator#16701 Test Plan: - [ ] Verify the Docker image builds successfully (triggered by the `scripts/` path filter) - [ ] Verify the `install_gcs_sdk_cpp` refactor doesn't break GCS adapter builds - [ ] Once image is published, update PR facebookincubator#16424 to remove `gRPC_SOURCE: BUNDLED` override and runtime `gh`/`jq` installation Reviewed By: pratikpugalia Differential Revision: D95972509 Pulled By: kgpai fbshipit-source-id: c6b9f99ed221c97b8977351f3b859682be5a8198
Summary: - Pre-downloads the gflags v2.2.2 source tarball into the Ubuntu Docker image at `/velox/deps-sources/gflags-v2.2.2.tar.gz` - This allows CI jobs that use `gflags_SOURCE=BUNDLED` to set `VELOX_GFLAGS_URL` to the local copy, avoiding a GitHub download on every run - Needed by PR #16424 which switches ubuntu-debug to SYSTEM dependencies with BUNDLED gflags Pull Request resolved: #16713 Test Plan: - [ ] Docker image builds successfully (triggered automatically by this PR) - [ ] Verify the tarball exists at `/velox/deps-sources/gflags-v2.2.2.tar.gz` in the built image Reviewed By: pratikpugalia Differential Revision: D96080928 Pulled By: kgpai fbshipit-source-id: d25d0435ee6f1923408bdf83c45f8ee5fc451356
Switch ubuntu-debug from BUNDLED to SYSTEM dependency resolution to avoid flaky internet downloads during merge-to-main builds. Move bundled-deps testing to a new scheduled job to maintain coverage of the resolve_dependency module.
…andalone workflow - Switch ubuntu-debug job to use ghcr.io/facebookincubator/velox-dev:ubuntu-22.04 container with SYSTEM dependency resolution instead of BUNDLED on plain runner - Add conditional dependency rebuild when setup scripts change (matching fedora pattern) - Remove -DVELOX_BUILD_SHARED=ON (container has static deps) - Increase MAX_LINK_JOBS from 2 to 6 - Move bundled-deps job from scheduled.yml to standalone ubuntu-bundled-deps.yml workflow, triggered only on dependency/CMake changes and daily schedule
The apache/infrastructure-actions/stash action requires gh and jq which are not present in the ubuntu-22.04 container image (unlike the centos/fedora images which install them explicitly).
…buntu The apt package libc-ares-dev installs the library as libcares.so (name: cares), but the Find module only searched for c-ares. Add cares as an alternative library name.
gRPC is not installed as a system package in the ubuntu-22.04 container and has no Find module, only a FetchContent-based build. Override just gRPC to build from source while all other deps use SYSTEM resolution.
The build was OOM-killed (exit code 137) at 2697/2785 during the linking phase when 6 large static test executables were being linked simultaneously on the 16-core runner (64GB RAM). Reducing to 3 concurrent link jobs matches the release build configuration on the same hardware.
The velox_function_dynamic_link_test fails with SYSTEM dependencies because Ubuntu's libgflags-dev provides shared libgflags.so by default. When folly is statically linked into both the test binary and the dlopen'd .so plugin, both copies try to register gflags flags into the same shared gflags registry, causing a duplicate flag error. The --exclude-libs,ALL linker flag only hides symbols from static archives, not from shared library dependencies like libgflags.so. Setting VELOX_GFLAGS_TYPE=static tells find_package(gflags) to use the static libgflags.a (also shipped by libgflags-dev), which keeps gflags consistent with the static folly build and allows --exclude-libs,ALL to properly isolate the plugin's symbols.
Three changes for the ubuntu-debug SYSTEM deps build: 1. Remove runtime gh/jq installation - now baked into the container image (commit 8a4cf83). 2. Remove gRPC_SOURCE=BUNDLED override - gRPC is now installed in the container image. 3. Skip velox_function_dynamic_link_test in SYSTEM builds. The test fails because the container's folly is built as static (libfolly.a) but linked against system shared gflags (libgflags.so). Folly's exported CMake config hardcodes gflags_shared as a transitive dependency. When both the test binary and dlopen'd .so plugin statically link folly, both copies register the same gflags flags into the shared gflags registry, causing a duplicate flag error. The --exclude-libs,ALL linker flag cannot prevent this because it only hides symbols from static archives, not shared libraries. The test is still exercised in BUNDLED builds (Fedora, macOS).
When folly is static and gflags is shared, dlopen'ing .so plugins that also link static folly causes duplicate gflags flag registration (both copies of folly register the same flags in the shared gflags registry). Fix by: 1. Building folly with -DGFLAGS_SHARED=FALSE so its exported CMake config references gflags_static instead of gflags_shared. 2. Using gflags_SOURCE=BUNDLED in ubuntu-debug CI to provide a PIC-enabled static gflags (apt's libgflags.a lacks -fPIC and cannot be linked into shared libraries). This ensures --exclude-libs,ALL fully isolates gflags symbols in .so plugins, preventing the dual-registration crash. Verified on test machine: all 6 dynamic link tests pass.
Fedora's gflags-devel package only provides shared libraries, so passing -DGFLAGS_SHARED=FALSE unconditionally breaks the Fedora Docker image build. Gate the flag on VELOX_BUILD_SHARED != ON since the dual gflags registration issue only affects static folly builds.
- CMakeLists.txt: When gflags_SOURCE=BUNDLED, check ENV{glog_SOURCE}
before forcing glog to BUNDLED. This allows CI to keep system glog
(which is built against the same gflags) and avoids dual glog flag
registration at runtime.
- linux-build-base.yml: Set glog_SOURCE=SYSTEM for ubuntu-debug so
system glog is used alongside BUNDLED gflags.
- centos-multi.dockerfile: Add LD_LIBRARY_PATH for INSTALL_PREFIX so
thrift1 can find libgflags.so.2.2 at runtime when folly switches
from shared to static gflags (removing the RPATH entry).
…xistence
The previous commit checked DEFINED ENV{glog_SOURCE} but did not read
its value, leaving the glog_SOURCE CMake variable empty. This caused
velox_resolve_dependency(glog) to fail with "Invalid source for glog:".
Now reads the env var value with $ENV{glog_SOURCE} so that
glog_SOURCE=SYSTEM propagates correctly to the build.
Summary: The merge to main job with resolve_dependencies fails occasionaly since it tries to get all dependencies from the internet . As we are trying to make the builds more robust and green - this change sets the job to use SYSTEM dependency. We will still test the resolve dependency module out but run it via our scheduled jobs.
Differential Revision: D93500381