-
Notifications
You must be signed in to change notification settings - Fork 83
Description
(sorta; I can workaround it by using Setup to build but it's really slow) blocker to #587.
Problem: GHC crashes while building Glean for macOS when using the CXX_MODE=make configuration:
[8 of 8] Compiling Glean.Write ( glean/client/hs/Glean/Write.hs, dist/build/client-hs/Glean/Write.o, dist/build/client-hs/Glean/Write.dyn_o )
ghc-9.10.1: Failed to lookup symbol: __ZN8facebook5glean3rts10raiseErrorERKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEE
....
GHC runtime linker: fatal error: I found a duplicate definition for symbol
__ZN6google7logging8internal13CheckOpStringD1Ev
whilst processing object file
/Users/jade/dev/mwb/tmp-build/source/.build/def/cxx/lib/libglean_cpp_rts.a
The symbol was previously defined in
/Users/jade/dev/mwb/tmp-build/source/.build/def/cxx/lib/libglean_cpp_rts.a(#5:cache.o)
This could be caused by:
* Loading two different object files which export the same symbol
* Specifying the same object file twice on the GHCi command line
* An incorrect `package.conf' entry, causing some object to be
loaded twice.
GHC runtime linker: fatal error: I found a duplicate definition for symbol
....
ghc-9.10.1: internal error: Relocation of type: 7 not (yet) supported!
(GHC version 9.10.1 for aarch64_apple_darwin)
Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
This looks sort of like the following GHC issues:
- GHC runtime linker cannot link with duplicate symbols (on Windows)
- GHCi runtime linker cannot link with duplicate common symbols
This may well be a GHC deficiency!
Summary
After investigating the problem I am reasonably confident it's that:
- CXX_MODE=make only builds .a files, so the GHC integrated dynamic linker is used when TemplateHaskell needs some library
- Cabal builds .dylib files which GHC loads preferentially and doesn't have this problem as a result
- The GHC linker on macOS doesn't like some very normal C++ duplicate definition patterns caused by putting things in headers
There are two (and a half) paths forward that I can think of for fixing this:
- Build a dylib in the CXX_MODE=make. This is known to work.
- Fix GHC?
- Stop importing the offending library from TH so it never tries to link the offending symbols to begin with? I don't think this is actually possible without Explicit Level Imports though.
Background
My experience working with Haskell build engineering for the past year and a bit reminds me that one of the hard problems with building Haskell is the loading of dependency libraries for running TemplateHaskell splices or similar.
This leads to loading dylibs into the GHC process itself, which has Problems(tm) with slow loading and such (if there are many of them) and it might be more ideal to use bytecode linking in the further future. I also know that GHC has a very rudimentary linker for loading .a staticlibs, but it's very much Not Preferred and is kind of janky.
TemplateHaskell is my speculation as to why it was even trying to load the object in the first place.
Examination
tmp-build/source » for f in .build/def/cxx/obj/glean/**/*.o; do if nm $f | grep -q __ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE20__throw_length_errorB8ne190107Ev; then echo $f; fi; done
.build/def/cxx/obj/glean/client/hs/cpp/write.o
.build/def/cxx/obj/glean/cpp/filewriter.o
.build/def/cxx/obj/glean/cpp/glean.o
.build/def/cxx/obj/glean/interprocess/cpp/worklist.o
.......
There's a million of these...
Example:
namespace logging {
namespace internal {
// A container for a string pointer which can be evaluated to a bool -
// true iff the pointer is nullptr.
struct CheckOpString {
CheckOpString(std::unique_ptr<std::string> str) : str_(std::move(str)) {}
explicit operator bool() const noexcept {
return GOOGLE_PREDICT_BRANCH_NOT_TAKEN(str_ != nullptr);
}
std::unique_ptr<std::string> str_;
};This looks like entirely cromulent C++ but defines a T type symbol in files that include it:
tmp-build/source » nm .build/def/cxx/obj/glean/cpp/glean.o | grep CheckOpString
U __ZN6google15LogMessageFatalC1EPKciRKNS_7logging8internal13CheckOpStringE
0000000000009430 T __ZN6google7logging8internal13CheckOpStringD1Ev
00000000000094e0 T __ZN6google7logging8internal17MakeCheckOpStringIPN5folly18threadlocal_detail11ThreadEntryES6_EENSt3__110unique_ptrINS7_12basic_stringIcNS7_11char_traitsIcEENS7_9a
llocatorIcEEEENS7_14default_deleteISE_EEEERKT_RKT0_PKc
000000000000b5d0 T __ZN6google7logging8internal17MakeCheckOpStringIPvS3_EENSt3__110unique_ptrINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS4_14default_deleteISB_EE
EERKT_RKT0_PKc
It seems like this must be at odds with using Haskell with C++ at all, and more notably, it only happens if we are using Nix to compile the Glean in question.
Attempt to isolate the repro
- -O2 makes no diff
export CXX_MODE=makeand thenmake cxx-libraries glean.cabal && cabal build client-hsdoes repro the same thing in cabal
Attempt to examine differences
The offending symbols are also multi defined in the .a file built by Cabal:
co/glean » nm dist-ok/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts/libHSglean-0.2.0.0-inplace-rts.a | grep __ZNKSt3__16vectorI
N8facebook5glean3rts6WordIdINS3_7Fid_tagEEENS_9allocatorIS6_EEE20__throw_length_errorB8ne190107Ev
0000000000000b84 T __ZNKSt3__16vectorIN8facebook5glean3rts6WordIdINS3_7Fid_tagEEENS_9allocatorIS6_EEE20__throw_length_errorB8ne190107Ev
0000000000008a10 T __ZNKSt3__16vectorIN8facebook5glean3rts6WordIdINS3_7Fid_tagEEENS_9allocatorIS6_EEE20__throw_length_errorB8ne190107Ev
000000000000b9ac T
The failing cabal build is invoking ghc exactly the same as with the successful build: any divergences must be in a package db?
co/glean » diff --color=always -u dist-newstyle/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf dist-ok/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf
--- dist-newstyle/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf 2026-01-21 21:18:04.601771074 -0800
+++ dist-ok/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf 2026-01-21 21:15:21.546817304 -0800
@@ -16,20 +16,29 @@
abi: inplace
library-dirs:
- /Users/jade/co/Glean/.build/def/cxx/lib
+ /Users/jade/co/Glean/dist-newstyle/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
/nix/store/qfrll4yw58mlghi5mqas78h1djyrq2dz-gflags-2.2.2/lib
/nix/store/xj733471zi0pv48hiqb0gvps8ycw4icx-icu4c-76.1/lib
/nix/store/8f3wh88hragbwzjpd857r1gl517lxzd4-folly-2025.04.21.00/lib
/nix/store/6wni88mr88zs32l3bmr2sv55l6zjpqq9-xxHash-0.8.3/lib
library-dirs-static:
+ /Users/jade/co/Glean/dist-newstyle/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
+ /nix/store/qfrll4yw58mlghi5mqas78h1djyrq2dz-gflags-2.2.2/lib
+ /nix/store/xj733471zi0pv48hiqb0gvps8ycw4icx-icu4c-76.1/lib
+ /nix/store/8f3wh88hragbwzjpd857r1gl517lxzd4-folly-2025.04.21.00/lib
+ /nix/store/6wni88mr88zs32l3bmr2sv55l6zjpqq9-xxHash-0.8.3/lib
+
+dynamic-library-dirs:
+ /Users/jade/co/Glean/dist-newstyle/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
/nix/store/qfrll4yw58mlghi5mqas78h1djyrq2dz-gflags-2.2.2/lib
/nix/store/xj733471zi0pv48hiqb0gvps8ycw4icx-icu4c-76.1/lib
/nix/store/8f3wh88hragbwzjpd857r1gl517lxzd4-folly-2025.04.21.00/lib
/nix/store/6wni88mr88zs32l3bmr2sv55l6zjpqq9-xxHash-0.8.3/lib
data-dir: /Users/jade/co/Glean/.
-extra-libraries: glean_cpp_rts folly icuuc gflags xxhash
+hs-libraries: HSglean-0.2.0.0-inplace-rts
+extra-libraries: folly icuuc gflags xxhash
extra-libraries-static:
folly c++abi icuuc icudata pthread m gflags pthread xxhash
Is it allowed to do that because it's a hs-libraries entry?
WAIT A MINUTE
Is this because it can load a dylib.. and the ghc linker is Very Simple. and .. is loading a staticlib of c++ which goes badly?
Evidence:
co/glean » ls dist-ok/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
autogen glean libHSglean-0.2.0.0-inplace-rts-ghc9.10.1.dylib libHSglean-0.2.0.0-inplace-rts.a
co/glean » ls .build/def/cxx/lib
libglean_client_hs.a libglean_cpp_rocksdb.a libglean_cpp_rts.a
libglean_cpp_client.a libglean_cpp_rocksdb_stats.a libglean_cpp_storage.a
Can we fix this by building a dylib of the rts so the Haskell linker isn't used?
(how bad is that?? probably pretty horrible)
And indeed it fixes it:
diff --git a/mk/cxx-make.mk b/mk/cxx-make.mk
index bea7006163..458cf2bf89 100644
--- a/mk/cxx-make.mk
+++ b/mk/cxx-make.mk
@@ -33,6 +33,20 @@
# define_library below
cxx-libraries: $(CXX_LIBRARIES)
+UNAME_S := $(shell uname -s)
+ifeq ($(UNAME_S),Darwin)
+ DYLIB_EXT := dylib
+else
+ DYLIB_EXT := so
+endif
+
+GLEAN_PKGCONFIG_DEPS = libfolly libunwind icu-uc libxxhash gflags
+GLEAN_PKGCONFIG_LIBS := $(shell pkg-config --libs $(GLEAN_PKGCONFIG_DEPS))
+
+# TODO(jade): obviously you can't hardcode this here, but I don't want to write
+# an evil awk script to patch this until we decide on an approach.
+LDFLAGS_DYNAMIC := -undefined dynamic_lookup /nix/store/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-fb-util-0.2.0.1/lib/ghc-9.10.1/lib/aarch64-osx-ghc-9.10.1-e176/libHSfb-util-0.2.0.1-CkfMjuHa4mjBJn6htj6bu6-ghc9.10.1.dylib
+
# For each library xxx, we define some variables and a target xxx which builds
# the actual library. We also track include dependencies via the usual
# mechanism.
@@ -49,13 +63,18 @@
define define_library
CXX_OBJECTS_$1 = $$(addprefix $$(CXX_OBJ_DIR)/,$$(CXX_SOURCES_$1:.cpp=.o))
CXX_LIB_$1 = $$(CXX_LIB_DIR)/lib$1.a
+ CXX_DYLIB_$1 = $$(CXX_LIB_DIR)/lib$1.$$(DYLIB_EXT)
.PHONY: $1
- $1:: $$(CXX_LIB_$1)
+ $1:: $$(CXX_LIB_$1) $$(CXX_DYLIB_$1)
$$(CXX_LIB_$1): $$(CXX_OBJECTS_$1)
@mkdir -p $$(@D)
$$(AR) rcs $$@ $$^
+ $$(CXX_DYLIB_$1): $$(CXX_OBJECTS_$1)
+ @mkdir -p $$(@D)
+ $$(CXX) $$(LDFLAGS_DYNAMIC) $$(GLEAN_PKGCONFIG_LIBS) -shared -o $$@ $$^
+
$$(CXX_OBJECTS_$1): $(CXX_OBJ_DIR)/%.o: %.cpp
@mkdir -p $$(@D)
$$(CXX) -o $$@ $$(CXXFLAGS) $$(CXX_FLAGS_$1) -fPIC -MMD -MP -c $$<