Drop flat namespace option (#1109)

Baltoli · web-flow · commit 039902614e5f · 2024-07-19T11:25:50.000-06:00
This PR fixes two subtle, related issues that are blocking updates from going through downstream in the Kasmer project. At a high level, the issues are: - Flat namespace linking on macOS produces incorrect symbol lookups in dynamic libraries. - #1097 misses a subtle edge case related to tail-call optimisation. The actual code changes required are small, but warrant some detailed explanation. ## Flat Namespaces For a long time, macOS has implemented a system known as _two-level_ namespaces, whereby undefined symbol names in a dynamic library are prefixed with the name of the library in which the loader expects to be able to find them at run-time. This is a conservative behaviour; even if a symbol with the same name exists in a different library, it won't be selected. For example, the dynamic libraries built by `llvm-kompile` in `c` mode link against `libgmp`. Two-level namespaces produce dynamic symbol tables that look like: ```console $ dyld_info test/c/Output/flat-namespace.kore.tmp.dir/libtest.so -symbolic_fixups | grep gmpz_clear +0x2B28 bind pointer libgmp.10.dylib/___gmpz_clear ``` This behaviour is different to Linux, which does not have a notion of two-level namespaces. For legacy compatibility purposes, Apple supply a linker flag `-flat_namespace` that behaves more similarly to Linux behaviour. Its use is discouraged in new code, but we had enabled it to work around an issue in the Python bindings (python/cpython#97524) that should be fixed in a future CPython / macOS combination.[^1] When enabled, the symbol table looks something like this for the same example: ```console $ dyld_info test/c/Output/flat-namespace.kore.tmp.dir/libtest.so -symbolic_fixups | grep gmpz_clear +0x2EE8 bind pointer flat-namespace/___gmpz_clear ``` As a consequence of this, if the symbol `___gmpz_clear` exists in multiple dynamic libraries loaded by the same process, then the order in which they will be selected by the dynamic loader is not clearly well-defined,[^2] and when it's referenced we could end up loading either the correct or the incorrect symbol. This caused the initial bug observed as follows:[^3] - The Haskell backend statically links the `kore-rpc-booster` executable against `libgmp`, meaning that some GMP symbols appear in that binary. - The backend compiles shared libraries that dynamically link against `libgmp`. - `kore-rpc-booster` dynamically loads one of these libraries, and when resolving symbols to load, the flat namespace environment selects the static version for some and the dynamic version for others. - A call to `__gmpz_clear` from a backend hook ends up referencing the statically linked symbol, rather than the dynamically linked version. Generally, I think this situation is harmless - GMP is very stable and it's plausible that doing this for most symbols is not observable. - However, the dynamically-linked GMP library has been set up to use the KORE memory management functions. When the static version is called, it tries to `free()` a pointer allocated by the backend's GC, and crashes. The fix for this issue is to drop our usage of `-flat_namespace` for C shared libraries compiled by the backend. This breaks a few places we were relying on the old (incorrect) behaviour in the presence of C++ RTTI; having multiple instances of identically-named typeinfo symbols in a process is known to be broken there: - `libunwind` is actually implicitly linked via the macOS system library; if we explicitly link it as well, then code that handles exceptions will break. - The `k-rule-apply` tool linked two copies of the KORE AST library, causing `dynamic_cast` to break. #1110 addresses this. ## Tail-Call Optimisation In #1097, we made some changes that explicitly mark K functions as `musttail` when we know they're tail recursive. In doing so, we removed the need to use the `-tailcallopt` flag in most cases. However, the change in that PR missed that as well as IR-level transformations, `-tailcallopt` sets a lower-level flag in the backend[^4] code generator that guarantees tail-call code generation. For large programs, this meant I could observe stack overflows when traversing large terms. The fix is just to enforce that this internal option gets set properly; doing so is just a restoration of the behaviour we got from `-tailcallopt` before. [^1]: But isn't yet fixed, unfortunately - the underlying bug is still present on my system. Should be revisited in the future, ideally! [^2]: It might be defined somewhere, but the initial manifestation of this bug appeared in an apparently unrelated commit, so I think we were just getting lucky previously. The fix in this PR is morally correct whether or not things worked accidentally beforehand. [^3]: I intend to write this up fully later in a separate issue. [^4]: As in the X86 or arm backend of LLVM itself.
diff --git a/bin/llvm-kompile-clang b/bin/llvm-kompile-clang
@@ -136,7 +136,6 @@ if [[ "$OSTYPE" == "darwin"* ]]; then
   flags=(
     "-L@BREW_PREFIX@/opt/libffi/lib"
     "-L@BREW_PREFIX@/lib"
-    "-L@LLVM_LIBRARY_DIR@"
     "-Wl,-u,_table_getArgumentSortsForTag"
     "-I" "@BREW_PREFIX@/include"
   )
@@ -158,6 +157,19 @@ else
   set_visibility_hidden="$LIBDIR/libSetVisibilityHidden.so"
 fi
 
+# On macOS, we get libunwind supplied as part of the developer tools in the OS,
+# and so don't need to link it directly. If we instead try to explictly link
+# against the libunwind that's part of Homebrew-supplied LLVM, it's easy to end
+# up in a situation where exceptions thrown in a shared library are not
+# compatible with the unwinding machinery in the main application binary. This
+# then manifests as BAD_ACCESS errors (or similar) when an exception is thrown,
+# _even if the exception should in principle be caught_.
+if [[ "$OSTYPE" == "darwin"* ]]; then
+  libunwind=""
+else
+  libunwind="-lunwind"
+fi
+
 # When building the Python AST module, there's no runtime and no main file, so
 # we skip this entire step. The library code is just C++, so we can skip
 # straight to invoking the C++ compiler.
@@ -207,7 +219,7 @@ if [[ "$OSTYPE" == "darwin"* ]]; then
   start_whole_archive="-force_load"
   end_whole_archive=""
 
-  flags+=("-Wl,-flat_namespace" "-Wl,-undefined" "-Wl,dynamic_lookup")
+  flags+=("-Wl,-undefined" "-Wl,dynamic_lookup")
 else
   start_whole_archive="-Wl,--whole-archive"
   end_whole_archive="-Wl,--no-whole-archive"
@@ -218,9 +230,13 @@ if [ "$main" = "static" ]; then
 elif [[ "$main" =~ "python" ]]; then
   # Don't link jemalloc when building a python library; it clashes with the
   # pymalloc implementation that Python expects you to use.
-  all_libraries=("${libraries[@]}" "-lgmp" "-lgmpxx" "-lmpfr" "-lpthread" "-ldl" "-lffi" "-lunwind")
+  all_libraries=("${libraries[@]}" "-lgmp" "-lgmpxx" "-lmpfr" "-lpthread" "-ldl" "-lffi" "$libunwind")
   flags+=("-fPIC" "-shared" "-I${INCDIR}" "-fvisibility=hidden")
 
+  if [[ "$OSTYPE" == "darwin"* ]]; then
+    flags+=("-Wl,-flat_namespace")
+  fi
+
   read -r -a python_include_flags <<< "$("${python_cmd}" -m pybind11 --includes)"
   flags+=("${python_include_flags[@]}")
 
@@ -240,11 +256,11 @@ elif [ "$main" = "c" ]; then
 
   # Avoid jemalloc for similar reasons as Python; we don't know who is loading
   # this library so don't want to impose it.
-  all_libraries=("${libraries[@]}" "-lgmp" "-lgmpxx" "-lmpfr" "-lpthread" "-ldl" "-lffi" "-lunwind")
+  all_libraries=("${libraries[@]}" "-lgmp" "-lgmpxx" "-lmpfr" "-lpthread" "-ldl" "-lffi" "$libunwind")
   flags+=("-fPIC" "-shared" "$start_whole_archive" "$LIBDIR/libkllvmcruntime.a" "$end_whole_archive")
   clangpp_args+=("-o" "${output_file}")
 else
-  all_libraries=("${libraries[@]}" "-lgmp" "-lgmpxx" "-lmpfr" "-lpthread" "-ldl" "-lffi" "-ljemalloc" "-lunwind")
+  all_libraries=("${libraries[@]}" "-lgmp" "-lgmpxx" "-lmpfr" "-lpthread" "-ldl" "-lffi" "-ljemalloc" "$libunwind")
 fi
 
 if $link; then
diff --git a/lib/codegen/ApplyPasses.cpp b/lib/codegen/ApplyPasses.cpp
@@ -123,6 +123,7 @@ void generate_object_file(llvm::Module &mod, llvm::raw_ostream &os) {
 
   auto features_string = features.getString();
   auto options = TargetOptions{};
+  options.GuaranteedTailCallOpt = true;
 
 #if LLVM_VERSION_MAJOR >= 16
   std::optional<CodeModel::Model> model = std::nullopt;
diff --git a/nix/llvm-backend.nix b/nix/llvm-backend.nix
@@ -40,7 +40,7 @@ stdenv.mkDerivation {
       --replace '"-liconv"' '"-L${libiconv}/lib" "-liconv"' \
       --replace '"-lncurses"' '"-L${ncurses}/lib" "-lncurses"' \
       --replace '"-ltinfo"' '"-L${ncurses}/lib" "-ltinfo"' \
-      --replace '"-lunwind"' '"-L${libunwind}/lib" "-lunwind"' \
+      --replace '"$libunwind"' '"-L${libunwind}/lib" "-lunwind"' \
       --replace '"-L@BREW_PREFIX@/opt/libffi/lib"' ' ' \
       --replace '"-L@LLVM_LIBRARY_DIR@"' ' ' \
       --replace '-L@BREW_PREFIX@/lib' '-L${libcxx}/lib' \
diff --git a/test/c/Inputs/flat-namespace.c b/test/c/Inputs/flat-namespace.c
@@ -0,0 +1,40 @@
+#include "api.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+
+/**
+ * The K program corresponding to this test will evaluate the `Int2Bytes` hooked
+ * function when it is run; this hook ends up allocating and freeing a temporary
+ * MPZ integer locally. If we're in a situation where `-flat_namespace` has been
+ * enabled on macOS, it's possible for the hook to end up resolving
+ * `__gmpz_clear` to a symbol defined in the host binary (rather than libgmp's
+ * dynamic library!).
+ *
+ * This originally manifested when the HB booster loaded a C bindings library
+ * (the booster statically links libgmp and so contains a symbol with this
+ * name); this test is a minimised reproduction of that issue.
+ */
+void __gmpz_clear(void *p) {
+  abort();
+}
+
+int main(int argc, char **argv) {
+  if (argc <= 1) {
+    return 1;
+  }
+
+  struct kllvm_c_api api = load_c_api(argv[1]);
+
+  api.kllvm_init();
+
+  kore_sort *sort_foo = api.kore_composite_sort_new("SortFoo");
+  kore_pattern *pat = api.kore_composite_pattern_new("Lblfoo");
+
+  kore_pattern *input = api.kore_pattern_make_interpreter_input(pat, sort_foo);
+
+  block *term = api.kore_pattern_construct(input);
+  block *after = api.take_steps(-1, term);
+
+  printf("%s", api.kore_block_dump(after));
+}
diff --git a/test/c/flat-namespace.kore b/test/c/flat-namespace.kore
diff --git a/test/c/k-files/flat-namespace.k b/test/c/k-files/flat-namespace.k