Fix nightly CI regressions, flaky tests, and Windows ccache from #14478#14508
Fix nightly CI regressions, flaky tests, and Windows ccache from #14478#14508xingbowang wants to merge 1 commit intofacebook:mainfrom
Conversation
✅ clang-tidy: No findings on changed linesCompleted in 273.8s. |
d2c8f32 to
ce80912
Compare
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit ae9db09 Confirmed: Now I have all the context needed. Here's the review: SummaryA well-targeted fix addressing four distinct CI regressions. The changes are correct, low-risk, and well-scoped. One suggestion regarding the conditional glog linking deserves attention. Issues Found🟡 Suggestion: Conditional glog linking may mask real link failuresFile: The Suggested improvement: Add a status message so failures are diagnosable: find_library(GLOG_LIBRARY glog)
if(GLOG_LIBRARY)
list(APPEND THIRDPARTY_LIBS glog)
else()
message(STATUS "glog not found — USE_FOLLY_LITE will build without glog")
endif()🟡 Suggestion: TearDown resets LOW and HIGH but tests may also modify themFile: The TearDown correctly resets BOTTOM=0, LOW=1, HIGH=1 to match the values One subtlety: 🟢 Nitpick:
|
| Context | Assessment |
|---|---|
| Release builds (NDEBUG) | Fixed — SyncPoint guard correctly prevents compile error |
| CMake with lld | Fixed — conditional glog avoids link failure when absent |
| Sharded test execution | Fixed — TearDown prevents thread pool state leakage across tests |
| Windows CI caching | Fixed — stable key + content-based compiler check should dramatically improve hit rate |
| Make-based builds | Unaffected — folly.mk still unconditionally links glog, which is fine since make uses ld not lld |
| Non-DBCompactionTest suites | Unaffected — TearDown is scoped to DBCompactionTest only |
Positive Observations
- The TearDown approach is strictly better than per-test cleanup: it's defensive against both current and future test leakers, and it runs even after assertion failures.
- The PR description is exceptionally thorough with clear root-cause analysis for each fix.
- The 70-run shuffled validation (50 full + 20 on 4 cores) is strong evidence the flaky test fix is robust.
- The ccache fix (stable key +
compiler_check=content) follows the pattern already established for macOS in thesetup-ccacheaction, maintaining cross-platform consistency.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
ce80912 to
b01a041
Compare
b01a041 to
8fd634a
Compare
Fix two build failures introduced by the accelerate-build PR (facebook#14478): 1. **Release build (NDEBUG) compile error**: The SyncPoint cleanup listener in testharness.cc referenced SyncPoint::GetInstance() which is only declared in debug builds. Guard the listener with #ifndef NDEBUG since it's only needed for debug test sharding anyway. 2. **Folly-lite link error**: The USE_FOLLY_LITE cmake path unconditionally linked -lglog, but glog may not be installed as a system library. Use find_library() to only link glog when available. Co-authored-by: Navi <navi@navibot.dev>
8fd634a to
ae9db09
Compare
1a1d04c to
ae9db09
Compare
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D98380757. |
Summary
Fix build failures, flaky tests, and Windows ccache issues exposed by PR #14478.
1. Release build (NDEBUG) compile error
Job:
build-linux-release-with-follyThe SyncPoint cleanup listener in testharness.cc referenced
SyncPoint::GetInstance()unconditionally, but SyncPoint is only declared in debug builds. Wrapped with#ifndef NDEBUG.2. Folly-lite link error
Job:
build-linux-cmake-with-folly-liteThe
USE_FOLLY_LITEcmake path unconditionally linked-lglog, which fails with lld (added in #14478) when glog isn't installed. Usefind_library()to link only when available.3. Flaky tests — leaked
Env::Default()thread pool stateSharded execution runs multiple tests per process. Several tests in
db_compaction_test.ccmodified the globalEnv::Default()BOTTOM thread pool without resetting. With a leaked BOTTOM pool, subsequent tests' last-level compactions get forwarded to the bottom pool, freeing the LOW thread to pick additional compactions unexpectedly.Fix: add
TearDown()override toDBCompactionTestthat captures default thread pool sizes in the constructor and restores them after every test. This is more robust than per-test cleanup because:4. Windows ccache — 0.26% hit rate → should be ~99%
The Windows nightly build takes 57 minutes because ccache has near-zero hit rate. Two issues:
hendrikmuhs/ccache-actionused a timestamp-based key, so each nightly run created a unique key and never found the previous run's cache (No cache found.). Fixed by using a stable keyccache-windows-<workflow>with prefix-based restore.compiler_check=contentsetting, so MSVC path/version changes between runners invalidated all cache entries. Addedcompiler_check=content(same fix as macOS in Accelerate build and test infrastructure #14478).Test Plan