Skip to content

Build dynamic libraries with CXX_MODE=make to make GHC happy on macOS #688

@lf-

Description

@lf-

(sorta; I can workaround it by using Setup to build but it's really slow) blocker to #587.

Problem: GHC crashes while building Glean for macOS when using the CXX_MODE=make configuration:

[8 of 8] Compiling Glean.Write      ( glean/client/hs/Glean/Write.hs, dist/build/client-hs/Glean/Write.o, dist/build/client-hs/Glean/Write.dyn_o )
ghc-9.10.1: Failed to lookup symbol: __ZN8facebook5glean3rts10raiseErrorERKNSt3__112basic_stringIcNS2_11char_traitsIcEENS2_9allocatorIcEEEE
....
GHC runtime linker: fatal error: I found a duplicate definition for symbol
   __ZN6google7logging8internal13CheckOpStringD1Ev
whilst processing object file
   /Users/jade/dev/mwb/tmp-build/source/.build/def/cxx/lib/libglean_cpp_rts.a
The symbol was previously defined in
   /Users/jade/dev/mwb/tmp-build/source/.build/def/cxx/lib/libglean_cpp_rts.a(#5:cache.o)
This could be caused by:
   * Loading two different object files which export the same symbol
   * Specifying the same object file twice on the GHCi command line
   * An incorrect `package.conf' entry, causing some object to be
     loaded twice.
GHC runtime linker: fatal error: I found a duplicate definition for symbol
....
ghc-9.10.1: internal error: Relocation of type: 7 not (yet) supported!

    (GHC version 9.10.1 for aarch64_apple_darwin)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

This looks sort of like the following GHC issues:

This may well be a GHC deficiency!

Summary

After investigating the problem I am reasonably confident it's that:

  • CXX_MODE=make only builds .a files, so the GHC integrated dynamic linker is used when TemplateHaskell needs some library
    • Cabal builds .dylib files which GHC loads preferentially and doesn't have this problem as a result
  • The GHC linker on macOS doesn't like some very normal C++ duplicate definition patterns caused by putting things in headers

There are two (and a half) paths forward that I can think of for fixing this:

  • Build a dylib in the CXX_MODE=make. This is known to work.
  • Fix GHC?
  • Stop importing the offending library from TH so it never tries to link the offending symbols to begin with? I don't think this is actually possible without Explicit Level Imports though.

Background

My experience working with Haskell build engineering for the past year and a bit reminds me that one of the hard problems with building Haskell is the loading of dependency libraries for running TemplateHaskell splices or similar.
This leads to loading dylibs into the GHC process itself, which has Problems(tm) with slow loading and such (if there are many of them) and it might be more ideal to use bytecode linking in the further future. I also know that GHC has a very rudimentary linker for loading .a staticlibs, but it's very much Not Preferred and is kind of janky.

TemplateHaskell is my speculation as to why it was even trying to load the object in the first place.

Examination

tmp-build/source » for f in .build/def/cxx/obj/glean/**/*.o; do if nm $f | grep -q __ZNKSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE20__throw_length_errorB8ne190107Ev; then echo $f; fi; done
.build/def/cxx/obj/glean/client/hs/cpp/write.o
.build/def/cxx/obj/glean/cpp/filewriter.o
.build/def/cxx/obj/glean/cpp/glean.o
.build/def/cxx/obj/glean/interprocess/cpp/worklist.o
.......

There's a million of these...

Example:

namespace logging {
namespace internal {

// A container for a string pointer which can be evaluated to a bool -
// true iff the pointer is nullptr.
struct CheckOpString {
  CheckOpString(std::unique_ptr<std::string> str) : str_(std::move(str)) {}
  explicit operator bool() const noexcept {
    return GOOGLE_PREDICT_BRANCH_NOT_TAKEN(str_ != nullptr);
  }
  std::unique_ptr<std::string> str_;
};

This looks like entirely cromulent C++ but defines a T type symbol in files that include it:

tmp-build/source » nm .build/def/cxx/obj/glean/cpp/glean.o | grep CheckOpString
                 U __ZN6google15LogMessageFatalC1EPKciRKNS_7logging8internal13CheckOpStringE
0000000000009430 T __ZN6google7logging8internal13CheckOpStringD1Ev
00000000000094e0 T __ZN6google7logging8internal17MakeCheckOpStringIPN5folly18threadlocal_detail11ThreadEntryES6_EENSt3__110unique_ptrINS7_12basic_stringIcNS7_11char_traitsIcEENS7_9a
llocatorIcEEEENS7_14default_deleteISE_EEEERKT_RKT0_PKc
000000000000b5d0 T __ZN6google7logging8internal17MakeCheckOpStringIPvS3_EENSt3__110unique_ptrINS4_12basic_stringIcNS4_11char_traitsIcEENS4_9allocatorIcEEEENS4_14default_deleteISB_EE
EERKT_RKT0_PKc

It seems like this must be at odds with using Haskell with C++ at all, and more notably, it only happens if we are using Nix to compile the Glean in question.


Attempt to isolate the repro

  • -O2 makes no diff
  • export CXX_MODE=make and then make cxx-libraries glean.cabal && cabal build client-hs does repro the same thing in cabal

Attempt to examine differences

The offending symbols are also multi defined in the .a file built by Cabal:

co/glean » nm dist-ok/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts/libHSglean-0.2.0.0-inplace-rts.a | grep __ZNKSt3__16vectorI
N8facebook5glean3rts6WordIdINS3_7Fid_tagEEENS_9allocatorIS6_EEE20__throw_length_errorB8ne190107Ev
0000000000000b84 T __ZNKSt3__16vectorIN8facebook5glean3rts6WordIdINS3_7Fid_tagEEENS_9allocatorIS6_EEE20__throw_length_errorB8ne190107Ev
0000000000008a10 T __ZNKSt3__16vectorIN8facebook5glean3rts6WordIdINS3_7Fid_tagEEENS_9allocatorIS6_EEE20__throw_length_errorB8ne190107Ev
000000000000b9ac T

The failing cabal build is invoking ghc exactly the same as with the successful build: any divergences must be in a package db?

co/glean » diff --color=always -u dist-newstyle/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf dist-ok/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf 
--- dist-newstyle/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf   2026-01-21 21:18:04.601771074 -0800
+++ dist-ok/packagedb/ghc-9.10.1/glean-0.2.0.0-inplace-rts.conf 2026-01-21 21:15:21.546817304 -0800
@@ -16,20 +16,29 @@
 
 abi:                    inplace
 library-dirs:
-    /Users/jade/co/Glean/.build/def/cxx/lib
+    /Users/jade/co/Glean/dist-newstyle/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
     /nix/store/qfrll4yw58mlghi5mqas78h1djyrq2dz-gflags-2.2.2/lib
     /nix/store/xj733471zi0pv48hiqb0gvps8ycw4icx-icu4c-76.1/lib
     /nix/store/8f3wh88hragbwzjpd857r1gl517lxzd4-folly-2025.04.21.00/lib
     /nix/store/6wni88mr88zs32l3bmr2sv55l6zjpqq9-xxHash-0.8.3/lib
 
 library-dirs-static:
+    /Users/jade/co/Glean/dist-newstyle/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
+    /nix/store/qfrll4yw58mlghi5mqas78h1djyrq2dz-gflags-2.2.2/lib
+    /nix/store/xj733471zi0pv48hiqb0gvps8ycw4icx-icu4c-76.1/lib
+    /nix/store/8f3wh88hragbwzjpd857r1gl517lxzd4-folly-2025.04.21.00/lib
+    /nix/store/6wni88mr88zs32l3bmr2sv55l6zjpqq9-xxHash-0.8.3/lib
+
+dynamic-library-dirs:
+    /Users/jade/co/Glean/dist-newstyle/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
     /nix/store/qfrll4yw58mlghi5mqas78h1djyrq2dz-gflags-2.2.2/lib
     /nix/store/xj733471zi0pv48hiqb0gvps8ycw4icx-icu4c-76.1/lib
     /nix/store/8f3wh88hragbwzjpd857r1gl517lxzd4-folly-2025.04.21.00/lib
     /nix/store/6wni88mr88zs32l3bmr2sv55l6zjpqq9-xxHash-0.8.3/lib
 
 data-dir:               /Users/jade/co/Glean/.
-extra-libraries:        glean_cpp_rts folly icuuc gflags xxhash
+hs-libraries:           HSglean-0.2.0.0-inplace-rts
+extra-libraries:        folly icuuc gflags xxhash
 extra-libraries-static:
     folly c++abi icuuc icudata pthread m gflags pthread xxhash
 

Is it allowed to do that because it's a hs-libraries entry?


WAIT A MINUTE
Is this because it can load a dylib.. and the ghc linker is Very Simple. and .. is loading a staticlib of c++ which goes badly?

Evidence:

co/glean » ls dist-ok/build/aarch64-osx/ghc-9.10.1/glean-0.2.0.0/l/rts/build/rts
autogen  glean  libHSglean-0.2.0.0-inplace-rts-ghc9.10.1.dylib  libHSglean-0.2.0.0-inplace-rts.a
co/glean » ls .build/def/cxx/lib
libglean_client_hs.a   libglean_cpp_rocksdb.a        libglean_cpp_rts.a
libglean_cpp_client.a  libglean_cpp_rocksdb_stats.a  libglean_cpp_storage.a

Can we fix this by building a dylib of the rts so the Haskell linker isn't used?
(how bad is that?? probably pretty horrible)

And indeed it fixes it:

diff --git a/mk/cxx-make.mk b/mk/cxx-make.mk
index bea7006163..458cf2bf89 100644
--- a/mk/cxx-make.mk
+++ b/mk/cxx-make.mk
@@ -33,6 +33,20 @@
 # define_library below
 cxx-libraries: $(CXX_LIBRARIES)
 
+UNAME_S := $(shell uname -s)
+ifeq ($(UNAME_S),Darwin)
+	DYLIB_EXT := dylib
+else
+	DYLIB_EXT := so
+endif
+
+GLEAN_PKGCONFIG_DEPS = libfolly libunwind icu-uc libxxhash gflags
+GLEAN_PKGCONFIG_LIBS := $(shell pkg-config --libs $(GLEAN_PKGCONFIG_DEPS))
+
+# TODO(jade): obviously you can't hardcode this here, but I don't want to write
+# an evil awk script to patch this until we decide on an approach.
+LDFLAGS_DYNAMIC := -undefined dynamic_lookup /nix/store/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa-fb-util-0.2.0.1/lib/ghc-9.10.1/lib/aarch64-osx-ghc-9.10.1-e176/libHSfb-util-0.2.0.1-CkfMjuHa4mjBJn6htj6bu6-ghc9.10.1.dylib
+
 # For each library xxx, we define some variables and a target xxx which builds
 # the actual library. We also track include dependencies via the usual
 # mechanism.
@@ -49,13 +63,18 @@
 define define_library
  CXX_OBJECTS_$1 = $$(addprefix $$(CXX_OBJ_DIR)/,$$(CXX_SOURCES_$1:.cpp=.o))
  CXX_LIB_$1 = $$(CXX_LIB_DIR)/lib$1.a
+ CXX_DYLIB_$1 = $$(CXX_LIB_DIR)/lib$1.$$(DYLIB_EXT)
  .PHONY: $1
- $1:: $$(CXX_LIB_$1)
+ $1:: $$(CXX_LIB_$1) $$(CXX_DYLIB_$1)
 
  $$(CXX_LIB_$1): $$(CXX_OBJECTS_$1)
 	@mkdir -p $$(@D)
 	$$(AR) rcs $$@ $$^
 
+ $$(CXX_DYLIB_$1): $$(CXX_OBJECTS_$1)
+	@mkdir -p $$(@D)
+	$$(CXX) $$(LDFLAGS_DYNAMIC) $$(GLEAN_PKGCONFIG_LIBS) -shared -o $$@ $$^
+
  $$(CXX_OBJECTS_$1): $(CXX_OBJ_DIR)/%.o: %.cpp
 	@mkdir -p $$(@D)
 	$$(CXX) -o $$@ $$(CXXFLAGS) $$(CXX_FLAGS_$1) -fPIC -MMD -MP -c $$<

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions