Skip to content

Conversation

@cachemeifyoucan
Copy link
Collaborator

Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement
ObjectStore and ActionCache respectively. Those are on-disk persistent
storage that build upon OnDiskTrieHashMap and implements key functions
that are required by LLVMCAS interfaces.

This abstraction layer defines how the objects are hashed and stored on
disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB
also defines:

  • How objects of various size are store on disk and are referenced by
    the trie nodes.
  • How to store the references from one stored object to another object
    that is referenced.

In addition to basic APIs for ObjectStore and ActionCache, other
advances database configuration features can be implemented in this
layer without exposing to the users of the LLVMCAS interface. For
example, OnDiskGraphDB has a faulty in function to fetch data from an
upstream OnDiskGraphDB if the data is missing.

@cachemeifyoucan cachemeifyoucan changed the base branch from users/cachemeifyoucan/spr/main.cas-add-ondiskgraphdb-and-ondiskkeyvaluedb to main October 7, 2025 19:49
@cachemeifyoucan cachemeifyoucan force-pushed the users/cachemeifyoucan/spr/cas-add-ondiskgraphdb-and-ondiskkeyvaluedb branch from e980f34 to c497eeb Compare October 7, 2025 19:49
cachemeifyoucan added a commit that referenced this pull request Oct 7, 2025
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement
ObjectStore and ActionCache respectively. Those are on-disk persistent
storage that build upon OnDiskTrieHashMap and implements key functions
that are required by LLVMCAS interfaces.

This abstraction layer defines how the objects are hashed and stored on
disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB
also defines:
* How objects of various size are store on disk and are referenced by
  the trie nodes.
* How to store the references from one stored object to another object
  that is referenced.

In addition to basic APIs for ObjectStore and ActionCache, other
advances database configuration features can be implemented in this
layer without exposing to the users of the LLVMCAS interface. For
example, OnDiskGraphDB has a faulty in function to fetch data from an
upstream OnDiskGraphDB if the data is missing.

Reviewers: 

Pull Request: #114102
@github-actions
Copy link

github-actions bot commented Oct 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement
ObjectStore and ActionCache respectively. Those are on-disk persistent
storage that build upon OnDiskTrieHashMap and implements key functions
that are required by LLVMCAS interfaces.

This abstraction layer defines how the objects are hashed and stored on
disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB
also defines:
* How objects of various size are store on disk and are referenced by
  the trie nodes.
* How to store the references from one stored object to another object
  that is referenced.

In addition to basic APIs for ObjectStore and ActionCache, other
advances database configuration features can be implemented in this
layer without exposing to the users of the LLVMCAS interface. For
example, OnDiskGraphDB has a faulty in function to fetch data from an
upstream OnDiskGraphDB if the data is missing.

Reviewers: 

Pull Request: #114102
@cachemeifyoucan cachemeifyoucan force-pushed the users/cachemeifyoucan/spr/cas-add-ondiskgraphdb-and-ondiskkeyvaluedb branch from a2bb4e4 to e2d20f8 Compare October 7, 2025 20:52
Created using spr 1.3.7
Created using spr 1.3.7
Created using spr 1.3.7
Created using spr 1.3.7
return InternalRef(Offset.get());
}

friend bool operator==(InternalRef LHS, InternalRef RHS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity: Can modern C++ = default this since there is only one data member?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is c++20? I don't think we are there yet.


using StandaloneDataMapTy = StandaloneDataMap<16>;

struct InternalHandle {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that explains what the "Internal" part of the name means?

Copy link
Collaborator Author

@cachemeifyoucan cachemeifyoucan Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class doesn't provide much value in terms of adding readability. It is basically providing some helper function when decoding ObjectHandle. I can just fold everything into ObjectHandle while not exposing the detailed encoding scheme in the header. Removing InternalHandle.

Created using spr 1.3.7
Created using spr 1.3.7
@cachemeifyoucan
Copy link
Collaborator Author

AArch64 bot failure due to AWS outage. Otherwise, don't see any problem. Merging.

@cachemeifyoucan cachemeifyoucan merged commit be9c083 into main Oct 20, 2025
12 of 14 checks passed
@cachemeifyoucan cachemeifyoucan deleted the users/cachemeifyoucan/spr/cas-add-ondiskgraphdb-and-ondiskkeyvaluedb branch October 20, 2025 20:16
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 20, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/16197

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93022 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
FAIL: LLVM-Unit :: CAS/./CASTests/4/22 (92951 of 93022)
******************** TEST 'LLVM-Unit :: CAS/./CASTests/4/22' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests-LLVM-Unit-2961543-4-22.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=22 GTEST_SHARD_INDEX=4 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests
--

Note: This is test shard 5 of 22.
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from OnDiskCASTest
[ RUN      ] OnDiskCASTest.OnDiskKeyValueDBTest
 #0 0x00005555556827a2 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13
 #1 0x000055555579881d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:13
 #2 0x00005555557961b7 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5
 #3 0x0000555555799bd7 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
 #4 0x00005555556b63ee IsInInterceptorScope /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:78:10
 #5 0x00005555556b63ee SignalAction(int, void*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1167:3
 #6 0x00007ffff7a458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0)
 #7 0x00007ffff7aa49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc)
 #8 0x00007ffff7a4579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e)
 #9 0x00007ffff7a288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd)
#10 0x0000555555643ecc (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests+0xefecc)
#11 0x000055555564295e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
#12 0x000055555566b367 (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests+0x117367)
#13 0x000055555566b4fd ~InterceptorScope /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:74:25
#14 0x000055555566b4fd ___interceptor_bcmp /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:895:1
#15 0x00005555556e186a bool std::__1::__constexpr_memcmp_equal[abi:nn220000]<char, char>(char const*, char const*, std::__1::__element_count) /home/b/sanitizer-x86_64-linux-fast/build/libcxx_install_msan/include/c++/v1/__string/constexpr_c_functions.h:124:68
#16 0x00005555556fae13 bool llvm::operator==<char>(llvm::ArrayRef<char>, llvm::ArrayRef<char>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:546:5
#17 0x00005555556fad71 testing::AssertionResult testing::internal::CmpHelperEQ<llvm::ArrayRef<char>, llvm::ArrayRef<char>>(char const*, char const*, llvm::ArrayRef<char> const&, llvm::ArrayRef<char> const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/include/gtest/gtest.h:1379:11
#18 0x00005555556f8a54 OnDiskCASTest_OnDiskKeyValueDBTest_Test::TestBody() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/CAS/OnDiskKeyValueDBTest.cpp:38:3
#19 0x00005555557d7a4e testing::Test::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:2695:9
#20 0x00005555557d8737 testing::TestInfo::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:2842:11
#21 0x00005555557d9568 testing::TestSuite::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:3018:35
#22 0x00005555557e7750 testing::internal::UnitTestImpl::RunAllTests() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:5922:15
#23 0x00005555557e70be testing::UnitTest::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:5485:10
#24 0x00005555557c886f main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/UnitTestMain/TestMain.cpp:55:3
#25 0x00007ffff7a2a578 (/lib/x86_64-linux-gnu/libc.so.6+0x2a578)
#26 0x00007ffff7a2a63b __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a63b)
Step 14 (stage2/msan check) failure: stage2/msan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 93022 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
FAIL: LLVM-Unit :: CAS/./CASTests/4/22 (92951 of 93022)
******************** TEST 'LLVM-Unit :: CAS/./CASTests/4/22' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests-LLVM-Unit-2961543-4-22.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=22 GTEST_SHARD_INDEX=4 /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests
--

Note: This is test shard 5 of 22.
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from OnDiskCASTest
[ RUN      ] OnDiskCASTest.OnDiskKeyValueDBTest
 #0 0x00005555556827a2 ___interceptor_backtrace /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:4530:13
 #1 0x000055555579881d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:13
 #2 0x00005555557961b7 llvm::sys::RunSignalHandlers() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Signals.cpp:0:5
 #3 0x0000555555799bd7 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38
 #4 0x00005555556b63ee IsInInterceptorScope /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:78:10
 #5 0x00005555556b63ee SignalAction(int, void*, void*) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1167:3
 #6 0x00007ffff7a458d0 (/lib/x86_64-linux-gnu/libc.so.6+0x458d0)
 #7 0x00007ffff7aa49bc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0xa49bc)
 #8 0x00007ffff7a4579e raise (/lib/x86_64-linux-gnu/libc.so.6+0x4579e)
 #9 0x00007ffff7a288cd abort (/lib/x86_64-linux-gnu/libc.so.6+0x288cd)
#10 0x0000555555643ecc (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests+0xefecc)
#11 0x000055555564295e __sanitizer::Die() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5
#12 0x000055555566b367 (/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/CAS/./CASTests+0x117367)
#13 0x000055555566b4fd ~InterceptorScope /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:74:25
#14 0x000055555566b4fd ___interceptor_bcmp /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:895:1
#15 0x00005555556e186a bool std::__1::__constexpr_memcmp_equal[abi:nn220000]<char, char>(char const*, char const*, std::__1::__element_count) /home/b/sanitizer-x86_64-linux-fast/build/libcxx_install_msan/include/c++/v1/__string/constexpr_c_functions.h:124:68
#16 0x00005555556fae13 bool llvm::operator==<char>(llvm::ArrayRef<char>, llvm::ArrayRef<char>) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:546:5
#17 0x00005555556fad71 testing::AssertionResult testing::internal::CmpHelperEQ<llvm::ArrayRef<char>, llvm::ArrayRef<char>>(char const*, char const*, llvm::ArrayRef<char> const&, llvm::ArrayRef<char> const&) /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/include/gtest/gtest.h:1379:11
#18 0x00005555556f8a54 OnDiskCASTest_OnDiskKeyValueDBTest_Test::TestBody() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/CAS/OnDiskKeyValueDBTest.cpp:38:3
#19 0x00005555557d7a4e testing::Test::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:2695:9
#20 0x00005555557d8737 testing::TestInfo::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:2842:11
#21 0x00005555557d9568 testing::TestSuite::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:3018:35
#22 0x00005555557e7750 testing::internal::UnitTestImpl::RunAllTests() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:5922:15
#23 0x00005555557e70be testing::UnitTest::Run() /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/googletest/src/gtest.cc:5485:10
#24 0x00005555557c886f main /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/third-party/unittest/UnitTestMain/TestMain.cpp:55:3
#25 0x00007ffff7a2a578 (/lib/x86_64-linux-gnu/libc.so.6+0x2a578)
#26 0x00007ffff7a2a63b __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a63b)

@fmayer
Copy link
Contributor

fmayer commented Oct 21, 2025

This broke the MSAN bot. Please fix forward or revert if you need extra time

@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 21, 2025

LLVM Buildbot has detected a new failure on builder clang-ppc64le-linux-multistage running on ppc64le-clang-multistage-test while building llvm at step 11 "ninja check 2".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/13253

Here is the relevant piece of the build log for the reference
Step 11 (ninja check 2) failure: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
...
PASS: ThreadSanitizer-powerpc64le :: restore_stack.cpp (103051 of 103061)
PASS: LLVM :: tools/llvm-readobj/ELF/file-header-machine-types.test (103052 of 103061)
PASS: LLVM :: tools/llvm-objdump/ELF/AMDGPU/subtarget.ll (103053 of 103061)
PASS: LLVM :: tools/llvm-readobj/ELF/AMDGPU/elf-headers.test (103054 of 103061)
PASS: LLVM :: CodeGen/ARM/build-attributes.ll (103055 of 103061)
PASS: SanitizerCommon-msan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (103056 of 103061)
PASS: SanitizerCommon-asan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (103057 of 103061)
PASS: SanitizerCommon-ubsan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (103058 of 103061)
PASS: SanitizerCommon-lsan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (103059 of 103061)
PASS: SanitizerCommon-tsan-powerpc64le-Linux :: Linux/signal_segv_handler.cpp (103060 of 103061)
command timed out: 1200 seconds without output running [b'ninja', b'check-all'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=3924.479917

@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 21, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/14/builds/4508

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: CAS/./CASTests.exe/4/24' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\llvm-clang-x86_64-expensive-checks-win\build\unittests\CAS\.\CASTests.exe-LLVM-Unit-12972-4-24.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=24 GTEST_SHARD_INDEX=4 C:\a\llvm-clang-x86_64-expensive-checks-win\build\unittests\CAS\.\CASTests.exe
--

Note: This is test shard 5 of 24.

[==========] Running 1 test from 1 test suite.

[----------] Global test environment set-up.

[----------] 1 test from OnDiskCASTest

[ RUN      ] OnDiskCASTest.OnDiskGraphDBSpaceLimit

C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\unittests\CAS\OnDiskGraphDBTest.cpp(301): error: Value of: llvm::detail::TakeError(store(*DB, Data, Refs).moveInto(ID))

Expected: succeeded

  Actual: failed  (invalid argument)



C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\unittests\CAS\OnDiskGraphDBTest.cpp(301): error: Value of: llvm::detail::TakeError(store(*DB, Data, Refs).moveInto(ID))

Expected: succeeded

  Actual: failed  (invalid argument)



C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\unittests\CAS\OnDiskGraphDBTest.cpp(301): error: Value of: llvm::detail::TakeError(store(*DB, Data, Refs).moveInto(ID))

Expected: succeeded

  Actual: failed  (invalid argument)



C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\unittests\CAS\OnDiskGraphDBTest.cpp(301): error: Value of: llvm::detail::TakeError(store(*DB, Data, Refs).moveInto(ID))

Expected: succeeded

  Actual: failed  (invalid argument)



C:\a\llvm-clang-x86_64-expensive-checks-win\llvm-project\llvm\unittests\CAS\OnDiskGraphDBTest.cpp(301): error: Value of: llvm::detail::TakeError(store(*DB, Data, Refs).moveInto(ID))

...

@cachemeifyoucan
Copy link
Collaborator Author

This broke the MSAN bot. Please fix forward or revert if you need extra time

Sorry didn't see the MSAN bot. The error is weird maybe due to how ArrayRef and move construction. Attempt a fix here: #164457

cachemeifyoucan added a commit that referenced this pull request Oct 21, 2025
Fix MSAN failure and expensive test failure.
@thurstond
Copy link
Contributor

Attempt a fix here: #164457

#164457 landed in https://lab.llvm.org/buildbot/#/builders/164/builds/14720 but the test (CAS/./CASTests/4/22) is still failing.

@cachemeifyoucan
Copy link
Collaborator Author

Attempt a fix here: #164457

#164457 landed in https://lab.llvm.org/buildbot/#/builders/164/builds/14720 but the test (CAS/./CASTests/4/22) is still failing.

Attempt 2 here: #164493

I didn't figure out the real reason of failure last time. It was due to not all the fields in std::array is initialized.

@thurstond
Copy link
Contributor

Attempt 2 here: #164493

I didn't figure out the real reason of failure last time. It was due to not all the fields in std::array is initialized.

Thanks for the quick fix!

OnDiskKeyValueDBTest.cpp
OnDiskTrieRawHashMapTest.cpp
ProgramTest.cpp
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most other places unconditionally include tests for conditional features and then ifdef out all the test code based on what's in config.h. Any reason not to do that here too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is cleaner this way than #ifdef the entire file out for those files. I can switch back to old ways if that is preferred.

I guess the other solution is put all of those under one test fixture and GTEST_SKIP() in Setup() if the build configuration is different.

@pawosm-arm
Copy link
Contributor

The test cases introduced here are failing in our CI on Red Hat Enterprise Linux 10, even last night despite all of those corrections merged later. This is the list of the systems on which these tests do .not. fail:

  • Amazon Linux 2023
  • Red Hat Enterprise Linux 8
  • Red Hat Enterprise Linux 9
  • Ubuntu 22
  • Ubuntu 24
  • SLES15sp6
  • SLES15sp7
    Does anyone know what needs to be configured in RHEL10 that does not need to be configured in RHEL8, RHEL9,AL2023, SLES15sp6, SLES15sp7, Ubuntu22, and Ubuntu24 that could make these tests pass on RHEL10?

@pawosm-arm
Copy link
Contributor

The error log on RHEL10:

23:14:23  llvm-lit: /workspace/src/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libunwind-static.cfg.in) Using %{cxx} substitution: '/workspace/build/stage/bootstrap_compiler/bin/clang++'
23:14:23  llvm-lit: /workspace/src/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libunwind-static.cfg.in) Using %{flags} substitution: ' --target=aarch64-unknown-linux-gnu'
23:14:23  llvm-lit: /workspace/src/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libunwind-static.cfg.in) Using %{compile_flags} substitution: '-nostdinc++ -I %{include} -funwind-tables -std=c++26 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wshift-negative-value -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -Wno-nullability-completeness -flax-vector-conversions=none -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -Wuser-defined-warnings'
23:14:23  llvm-lit: /workspace/src/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libunwind-static.cfg.in) Using %{link_flags} substitution: '%{lib}/libunwind.a -lpthread -Wl,--export-dynamic -ldl -latomic'
23:14:23  llvm-lit: /workspace/src/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libunwind-static.cfg.in) Using %{benchmark_flags} substitution: ''
23:14:23  llvm-lit: /workspace/src/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libunwind-static.cfg.in) Using %{exec} substitution: '%{executor} --execdir %{temp} -- '
23:14:23  llvm-lit: /workspace/src/libcxx/utils/libcxx/test/config.py:24: note: (llvm-libunwind-static.cfg.in) All available features: add-latomic-workaround, buildhost=linux, c++26, can-create-symlinks, character-conversion-warnings, clang, clang-22, clang-22.0, clang-22.0.0, diagnose-if-support, enable-benchmarks=run, gcc-style-warnings, glibc-old-ru_RU-decimal-point, has-fblocks, has-fconstexpr-steps, has-unix-headers, large_tests, libcpp-has-no-experimental-hardening-observe-semantic, libcpp-has-no-experimental-syncstream, libcpp-has-no-experimental-tzdb, libcpp-has-no-incomplete-pstl, linux, long_tests, objcopy-available, objective-c++, optimization=none, std-at-least-c++03, std-at-least-c++11, std-at-least-c++14, std-at-least-c++17, std-at-least-c++20, std-at-least-c++23, std-at-least-c++26, stdlib=libc++, stdlib=llvm-libc++, target=aarch64-unknown-linux-gnu, verify-support, win32-broken-utf8-wchar-ctype
23:14:24  llvm-lit: /workspace/src/llvm/utils/lit/lit/llvm/config.py:531: note: using ld.lld: /workspace/build/stage/bootstrap_compiler/bin/ld.lld
23:14:24  llvm-lit: /workspace/src/llvm/utils/lit/lit/llvm/config.py:531: note: using lld-link: /workspace/build/stage/bootstrap_compiler/bin/lld-link
23:14:24  llvm-lit: /workspace/src/llvm/utils/lit/lit/llvm/config.py:531: note: using ld64.lld: /workspace/build/stage/bootstrap_compiler/bin/ld64.lld
23:14:24  llvm-lit: /workspace/src/llvm/utils/lit/lit/llvm/config.py:531: note: using wasm-ld: /workspace/build/stage/bootstrap_compiler/bin/wasm-ld
23:14:25  -- Testing: 87987 tests, 64 workers --
23:15:37  Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
23:15:37  FAIL: LLVM-Unit :: CAS/./CASTests/1/22 (82348 of 87987)
23:15:37  ******************** TEST 'LLVM-Unit :: CAS/./CASTests/1/22' FAILED ********************
23:15:37  Script(shard):
23:15:37  --
23:15:37  GTEST_OUTPUT=json:/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests-LLVM-Unit-19360-1-22.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=22 GTEST_SHARD_INDEX=1 /workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests
23:15:37  --
23:15:37  
23:15:37  Note: This is test shard 2 of 22.
23:15:37  [==========] Running 1 test from 1 test suite.
23:15:37  [----------] Global test environment set-up.
23:15:37  [----------] 1 test from OnDiskCASTest
23:15:37  [ RUN      ] OnDiskCASTest.OnDiskGraphDBFaultInSingleNode
23:15:37  Failure value returned from cantFail wrapped call
23:15:37  data record span passed the end of the data pool
23:15:37  UNREACHABLE executed at /workspace/src/llvm/include/llvm/Support/Error.h:810!
23:15:37   #0 0x00000000004673b0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4673b0)
23:15:37   #1 0x0000000000464f98 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
23:15:37   #2 0x0000f3550d546968 (linux-vdso.so.1+0x968)
23:15:37   #3 0x0000f3550d06b880 __pthread_kill_implementation (/lib64/libc.so.6+0x8b880)
23:15:37   #4 0x0000f3550d01aa40 gsignal (/lib64/libc.so.6+0x3aa40)
23:15:37   #5 0x0000f3550d005988 abort (/lib64/libc.so.6+0x25988)
23:15:37   #6 0x000000000044ffac (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x44ffac)
23:15:37   #7 0x0000000000494684 getContentFromHandle(llvm::cas::OnDiskDataAllocator const&, llvm::cas::ondisk::ObjectHandle) OnDiskGraphDB.cpp:0:0
23:15:37   #8 0x000000000049477c llvm::cas::ondisk::OnDiskGraphDB::getObjectData(llvm::cas::ondisk::ObjectHandle) const (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x49477c)
23:15:37   #9 0x0000000000498e7c llvm::cas::ondisk::OnDiskGraphDB::importSingleNode(llvm::cas::ondisk::ObjectID, llvm::cas::ondisk::ObjectHandle) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x498e7c)
23:15:37  #10 0x000000000049a224 llvm::cas::ondisk::OnDiskGraphDB::faultInFromUpstream(llvm::cas::ondisk::ObjectID) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x49a224)
23:15:37  #11 0x0000000000499304 llvm::cas::ondisk::OnDiskGraphDB::load(llvm::cas::ondisk::ObjectID) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x499304)
23:15:37  #12 0x0000000000426a6c OnDiskCASTest_OnDiskGraphDBFaultInSingleNode_Test::TestBody() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x426a6c)
23:15:37  #13 0x00000000004b7b10 testing::Test::Run() (.part.0) gtest-all.cc:0:0
23:15:37  #14 0x00000000004be3b0 testing::TestInfo::Run() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4be3b0)
23:15:37  #15 0x00000000004c88c0 testing::TestSuite::Run() (.part.0) gtest-all.cc:0:0
23:15:37  #16 0x00000000004c93a4 testing::internal::UnitTestImpl::RunAllTests() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4c93a4)
23:15:37  #17 0x00000000004c993c testing::UnitTest::Run() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4c993c)
23:15:37  #18 0x0000000000408d70 main (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x408d70)
23:15:37  #19 0x0000f3550d00609c __libc_start_call_main (/lib64/libc.so.6+0x2609c)
23:15:37  #20 0x0000f3550d00617c __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x2617c)
23:15:37  #21 0x00000000004093f0 _start (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4093f0)
23:15:37  
23:15:37  --
23:15:37  exit: -6
23:15:37  --
23:15:37  shard JSON output does not exist: /workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests-LLVM-Unit-19360-1-22.json
23:15:37  ********************
23:15:37  Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
23:15:37  FAIL: LLVM-Unit :: CAS/./CASTests/2/22 (82363 of 87987)
23:15:37  ******************** TEST 'LLVM-Unit :: CAS/./CASTests/2/22' FAILED ********************
23:15:37  Script(shard):
23:15:37  --
23:15:37  GTEST_OUTPUT=json:/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests-LLVM-Unit-19360-2-22.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=22 GTEST_SHARD_INDEX=2 /workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests
23:15:37  --
23:15:37  
23:15:37  Note: This is test shard 3 of 22.
23:15:37  [==========] Running 1 test from 1 test suite.
23:15:37  [----------] Global test environment set-up.
23:15:37  [----------] 1 test from OnDiskCASTest
23:15:37  [ RUN      ] OnDiskCASTest.OnDiskGraphDBFaultInFullTree
23:15:37  Failure value returned from cantFail wrapped call
23:15:37  data record span passed the end of the data pool
23:15:37  UNREACHABLE executed at /workspace/src/llvm/include/llvm/Support/Error.h:810!
23:15:37   #0 0x00000000004673b0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4673b0)
23:15:37   #1 0x0000000000464f98 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
23:15:37   #2 0x0000fa0cabe31968 (linux-vdso.so.1+0x968)
23:15:37   #3 0x0000fa0cab94b880 __pthread_kill_implementation (/lib64/libc.so.6+0x8b880)
23:15:37   #4 0x0000fa0cab8faa40 gsignal (/lib64/libc.so.6+0x3aa40)
23:15:37   #5 0x0000fa0cab8e5988 abort (/lib64/libc.so.6+0x25988)
23:15:37   #6 0x000000000044ffac (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x44ffac)
23:15:37   #7 0x0000000000494684 getContentFromHandle(llvm::cas::OnDiskDataAllocator const&, llvm::cas::ondisk::ObjectHandle) OnDiskGraphDB.cpp:0:0
23:15:37   #8 0x0000000000494858 llvm::cas::ondisk::OnDiskGraphDB::getInternalRefs(llvm::cas::ondisk::ObjectHandle) const (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x494858)
23:15:37   #9 0x0000000000494a44 llvm::cas::ondisk::OnDiskGraphDB::importFullTree(llvm::cas::ondisk::ObjectID, llvm::cas::ondisk::ObjectHandle)::'lambda'(llvm::cas::ondisk::ObjectID, std::optional<llvm::cas::ondisk::ObjectHandle>)::operator()(llvm::cas::ondisk::ObjectID, std::optional<llvm::cas::ondisk::ObjectHandle>) const (.isra.0) OnDiskGraphDB.cpp:0:0
23:15:37  #10 0x0000000000499a4c llvm::cas::ondisk::OnDiskGraphDB::importFullTree(llvm::cas::ondisk::ObjectID, llvm::cas::ondisk::ObjectHandle) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x499a4c)
23:15:37  #11 0x000000000049a2f8 llvm::cas::ondisk::OnDiskGraphDB::faultInFromUpstream(llvm::cas::ondisk::ObjectID) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x49a2f8)
23:15:37  #12 0x0000000000499304 llvm::cas::ondisk::OnDiskGraphDB::load(llvm::cas::ondisk::ObjectID) (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x499304)
23:15:37  #13 0x0000000000428bec OnDiskCASTest_OnDiskGraphDBFaultInFullTree_Test::TestBody() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x428bec)
23:15:37  #14 0x00000000004b7b10 testing::Test::Run() (.part.0) gtest-all.cc:0:0
23:15:37  #15 0x00000000004be3b0 testing::TestInfo::Run() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4be3b0)
23:15:37  #16 0x00000000004c88c0 testing::TestSuite::Run() (.part.0) gtest-all.cc:0:0
23:15:37  #17 0x00000000004c93a4 testing::internal::UnitTestImpl::RunAllTests() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4c93a4)
23:15:37  #18 0x00000000004c993c testing::UnitTest::Run() (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4c993c)
23:15:37  #19 0x0000000000408d70 main (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x408d70)
23:15:37  #20 0x0000fa0cab8e609c __libc_start_call_main (/lib64/libc.so.6+0x2609c)
23:15:37  #21 0x0000fa0cab8e617c __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x2617c)
23:15:37  #22 0x00000000004093f0 _start (/workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4093f0)
23:15:37  
23:15:37  --
23:15:37  exit: -6
23:15:37  --
23:15:37  shard JSON output does not exist: /workspace/build/stage/bootstrap_compiler/unittests/CAS/./CASTests-LLVM-Unit-19360-2-22.json
23:15:37  ********************
23:16:02  Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
23:16:02  ********************
23:16:02  Failed Tests (2):
23:16:02    LLVM-Unit :: CAS/./CASTests/1/22
23:16:02    LLVM-Unit :: CAS/./CASTests/2/22
23:16:02  
23:16:02  
23:16:02  Testing Time: 95.25s
23:16:02  
23:16:02  Total Discovered Tests: 121755
23:16:02    Skipped          :   218 (0.18%)
23:16:02    Unsupported      : 44287 (36.37%)
23:16:02    Passed           : 77170 (63.38%)
23:16:02    Expectedly Failed:    78 (0.06%)
23:16:02    Failed           :     2 (0.00%)
23:16:02  FAILED: CMakeFiles/check-all /workspace/build/stage/bootstrap_compiler/CMakeFiles/check-all 

@cachemeifyoucan
Copy link
Collaborator Author

cachemeifyoucan commented Oct 24, 2025

@pawosm-arm what is your target triple for those systems? What is the file system format? Do you have ways to reproduce it like from a docker image?

@cachemeifyoucan
Copy link
Collaborator Author

We have down stream CI running on Ubuntu and do not observe such errors. Let me know how can I help to fix it (or figure out some limitations).

@pawosm-arm
Copy link
Contributor

pawosm-arm commented Oct 24, 2025

@pawosm-arm what is your target triple for those systems? What is the file system format? Do you have ways to reproduce it like from a docker image?

It's the same as in Ubuntu, and as in the previous versions of RHEL: aarch64-unknown-linux-gnu. It's only Amazon Linux 2023 where the target triple is different (aarch64-amazon-linux), but, as I mentioned earlier, this problem does not occur on Amazon Linux. The fact that it does not occur on RHEL8 or RHEL9 and occurs only on RHEL10 makes it extremely weird.

The filesystem is a standard ext4, and our building method is publicly visible: https://github.com/arm/arm-toolchain/blob/arm-software/arm-software/linux/build.sh (https://github.com/arm/arm-toolchain/blob/arm-software/arm-software/linux/build.sh-HOWTO.md for more details)

@cachemeifyoucan
Copy link
Collaborator Author

The fact that it does not occur on RHEL8 or RHEL9 and occurs only on RHEL10 makes it extremely weird.

That is weird. I have trouble to get a RHEL10 container to build llvm. The error looks like some difference in file system memory mapping but it is hard to say without reproducing it. Let me know what you want to do in the next step. This is unit-test so we can disable that for now if we have a good macro to guard it against.

@cachemeifyoucan
Copy link
Collaborator Author

I manage to build inside a container but I am not able to reproduce:

[  PASSED  ] 22 tests.
[root@767c9162-505f-4e6b-ad0e-8218fd3e74dd build]#  cat /etc/redhat-release
Red Hat Enterprise Linux release 10.0 (Coughlan)
[root@767c9162-505f-4e6b-ad0e-8218fd3e74dd build]# uname -a
Linux 767c9162-505f-4e6b-ad0e-8218fd3e74dd 6.6.9 #1 SMP Fri Sep  5 23:08:25 UTC 2025 aarch64 GNU/Linux
[root@767c9162-505f-4e6b-ad0e-8218fd3e74dd build]# mount
/dev/vda on / type ext4 (rw,relatime)

@pawosm-arm
Copy link
Contributor

The error looks like some difference in file system memory mapping but it is hard to say without reproducing it.

That is extremely concerning. This test might have exposed something serious which have a potential to haunt the userspace programs doing specific or otherwise extensive filesystem I/O.

@pawosm-arm
Copy link
Contributor

I manage to build inside a container but I am not able to reproduce:

[  PASSED  ] 22 tests.
[root@767c9162-505f-4e6b-ad0e-8218fd3e74dd build]#  cat /etc/redhat-release
Red Hat Enterprise Linux release 10.0 (Coughlan)
[root@767c9162-505f-4e6b-ad0e-8218fd3e74dd build]# uname -a
Linux 767c9162-505f-4e6b-ad0e-8218fd3e74dd 6.6.9 #1 SMP Fri Sep  5 23:08:25 UTC 2025 aarch64 GNU/Linux
[root@767c9162-505f-4e6b-ad0e-8218fd3e74dd build]# mount
/dev/vda on / type ext4 (rw,relatime)

I need to obtain some more knowledge from our devops, every single detail that could make any difference even the slightest one that they may consider insignificant that could result in this unexpected difference in file system memory mapping.

Also, could you write some higher level explanation what is the purpose of the code (and the test) that's failing? Maybe this could give them a hint where to look at...

@cachemeifyoucan
Copy link
Collaborator Author

Also, could you write some higher level explanation what is the purpose of the code (and the test) that's failing? Maybe this could give them a hint where to look at...

There is an error message data record span passed the end of the data pool. If you grep for the message, it is checking if the data record is extended passed the end of the memory mapped region. The test itself is trying to copy structured data from one database file to another, and the failure is either the sizing of the memory mapped region is not correctly tracked, or the data stored is corrupted, or the lookup is from the wrong part of the file. I hope it is not later two logic errors since it is not failing in other systems.

@pawosm-arm
Copy link
Contributor

Also, could you write some higher level explanation what is the purpose of the code (and the test) that's failing? Maybe this could give them a hint where to look at...

There is an error message data record span passed the end of the data pool. If you grep for the message, it is checking if the data record is extended passed the end of the memory mapped region. The test itself is trying to copy structured data from one database file to another, and the failure is either the sizing of the memory mapped region is not correctly tracked, or the data stored is corrupted, or the lookup is from the wrong part of the file. I hope it is not later two logic errors since it is not failing in other systems.

A naïve question: could filesystem size affect that? or lack of free space?

@cachemeifyoucan
Copy link
Collaborator Author

I would expect running out of space will trigger a different error, and this error also happens on the read only side. Highly unlikely but I can't say for sure

@pawosm-arm
Copy link
Contributor

Good news is that I've managed to reproduce it manually, outside of our CI, which means the problem isn't of CI-magical class:

******************** TEST 'LLVM-Unit :: CAS/./CASTests/1/22' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests-LLVM-Unit-20248-1-22.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=22 GTEST_SHARD_INDEX=1 /home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests
--

Note: This is test shard 2 of 22.
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from OnDiskCASTest
[ RUN      ] OnDiskCASTest.OnDiskGraphDBFaultInSingleNode
Failure value returned from cantFail wrapped call
data record span passed the end of the data pool
UNREACHABLE executed at /home/testuser/arm-toolchain/llvm/include/llvm/Support/Error.h:810!
 #0 0x0000000000466d30 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x466d30)
 #1 0x0000000000464918 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #2 0x0000e7c3b8b408f8 (linux-vdso.so.1+0x8f8)
 #3 0x0000e7c3b865b880 __pthread_kill_implementation (/lib64/libc.so.6+0x8b880)
 #4 0x0000e7c3b860aa40 gsignal (/lib64/libc.so.6+0x3aa40)
 #5 0x0000e7c3b85f5988 abort (/lib64/libc.so.6+0x25988)
 #6 0x000000000044f92c (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x44f92c)
 #7 0x0000000000494044 getContentFromHandle(llvm::cas::OnDiskDataAllocator const&, llvm::cas::ondisk::ObjectHandle) OnDiskGraphDB.cpp:0:0
 #8 0x000000000049413c llvm::cas::ondisk::OnDiskGraphDB::getObjectData(llvm::cas::ondisk::ObjectHandle) const (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x49413c)
 #9 0x000000000049883c llvm::cas::ondisk::OnDiskGraphDB::importSingleNode(llvm::cas::ondisk::ObjectID, llvm::cas::ondisk::ObjectHandle) (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x49883c)
#10 0x0000000000499be4 llvm::cas::ondisk::OnDiskGraphDB::faultInFromUpstream(llvm::cas::ondisk::ObjectID) (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x499be4)
#11 0x0000000000498cc4 llvm::cas::ondisk::OnDiskGraphDB::load(llvm::cas::ondisk::ObjectID) (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x498cc4)
#12 0x0000000000426a6c OnDiskCASTest_OnDiskGraphDBFaultInSingleNode_Test::TestBody() (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x426a6c)
#13 0x00000000004b74d0 testing::Test::Run() (.part.0) gtest-all.cc:0:0
#14 0x00000000004bdd70 testing::TestInfo::Run() (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4bdd70)
#15 0x00000000004c8280 testing::TestSuite::Run() (.part.0) gtest-all.cc:0:0
#16 0x00000000004c8d64 testing::internal::UnitTestImpl::RunAllTests() (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4c8d64)
#17 0x00000000004c92fc testing::UnitTest::Run() (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4c92fc)
#18 0x0000000000408d70 main (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x408d70)
#19 0x0000e7c3b85f609c __libc_start_call_main (/lib64/libc.so.6+0x2609c)
#20 0x0000e7c3b85f617c __libc_start_main@GLIBC_2.17 (/lib64/libc.so.6+0x2617c)
#21 0x00000000004093f0 _start (/home/testuser/arm-toolchain/arm-software/linux/build/stage/bootstrap_compiler/unittests/CAS/./CASTests+0x4093f0)

The bad news, I still don't know what the root cause could be.

@cachemeifyoucan
Copy link
Collaborator Author

Is this deterministically failing? Is there anyway we can figure out a way to reproduce it myself?

@pawosm-arm
Copy link
Contributor

Trouble is, there is nothing unusual in the docker image I've tried: https://catalog.redhat.com/en/software/containers/ubi10/ubi/66f2b46b122803e4937d11ae#packages

I've also managed to reproduce it again, on a much simpler AArch64 machine, so it's not a case of using a sophisticated hardware.

Have you got a smaller piece of code that does the same thing so I could compile and run it in there?

Could it be a glibc bug?

$ rpm -q glibc
glibc-2.39-46.el10_0.aarch64

@cachemeifyoucan
Copy link
Collaborator Author

Oh, I can reproduce now. It seems only when using gcc, not clang. Let me do some investigation:

RecordSize: 3355443222  Offset: 144 Size: 168

Looks like a corrupted data record somehow.

@pawosm-arm
Copy link
Contributor

It seems only when using gcc, not clang

Makes sense, notice that on us it happens as we're testing the bootstrap compiler, which has been built with gcc/g++.

@cachemeifyoucan
Copy link
Collaborator Author

I don't know if it is a gcc bug. Only reproduce under -O3 not -O2.

@cachemeifyoucan
Copy link
Collaborator Author

I can't see any obvious logic mistakes in the code. I haven't dig too deep but it seems DataRecordHandle::getRefsRelOffset() is mis-compiled if LayoutFlag::getLayoutFlag() and its unpack function is inlined.

Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement
ObjectStore and ActionCache respectively. Those are on-disk persistent
storage that build upon OnDiskTrieHashMap and implements key functions
that are required by LLVMCAS interfaces.

This abstraction layer defines how the objects are hashed and stored on
disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB
also defines:
* How objects of various size are store on disk and are referenced by
  the trie nodes.
* How to store the references from one stored object to another object
  that is referenced.

In addition to basic APIs for ObjectStore and ActionCache, other
advances database configuration features can be implemented in this
layer without exposing to the users of the LLVMCAS interface. For
example, OnDiskGraphDB has a faulty in function to fetch data from an
upstream OnDiskGraphDB if the data is missing.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
Fix MSAN failure and expensive test failure.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
Add OnDiskGraphDB and OnDiskKeyValueDB that can be used to implement
ObjectStore and ActionCache respectively. Those are on-disk persistent
storage that build upon OnDiskTrieHashMap and implements key functions
that are required by LLVMCAS interfaces.

This abstraction layer defines how the objects are hashed and stored on
disk. OnDiskKeyValueDB is a basic OnDiskTrieHashMap while OnDiskGraphDB
also defines:
* How objects of various size are store on disk and are referenced by
  the trie nodes.
* How to store the references from one stored object to another object
  that is referenced.

In addition to basic APIs for ObjectStore and ActionCache, other
advances database configuration features can be implemented in this
layer without exposing to the users of the LLVMCAS interface. For
example, OnDiskGraphDB has a faulty in function to fetch data from an
upstream OnDiskGraphDB if the data is missing.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
Fix MSAN failure and expensive test failure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants