Skip to content

[OpenMP] Change build of OpenMP device runtime to be a separate runtime #136729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Apr 22, 2025

Summary:
Currently we build the OpenMP device runtime as part of the offload/
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.

This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from offload/ into openmp/ because it
is still the openmp/ runtime and I feel it is more appropriate. We do
want a generic offload/ library at some point, but it would be trivial
to then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.

This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.

    -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda'     \
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \

This also changed where the .bc version of the library lives, but it's
still created.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime offload labels Apr 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-offload

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently we build the OpenMP device runtime as part of the offload/
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.

This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from offload/ into openmp/ because it
is still the openmp/ runtime and I feel it is more appropriate. We do
want a generic offload/ library at some point, but it would be trivial
to then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.

This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.

    -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda'     \
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \

This also changed where the .bc version of the library lives, but it's
still created.


Patch is 24.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136729.diff

36 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+5)
  • (modified) offload/CMakeLists.txt (+7-1)
  • (removed) offload/DeviceRTL/CMakeLists.txt (-181)
  • (modified) offload/cmake/caches/Offload.cmake (+2-2)
  • (modified) openmp/CMakeLists.txt (+45-31)
  • (added) openmp/device/CMakeLists.txt (+99)
  • (renamed) openmp/device/include/Allocator.h ()
  • (renamed) openmp/device/include/Configuration.h ()
  • (renamed) openmp/device/include/Debug.h ()
  • (renamed) openmp/device/include/DeviceTypes.h ()
  • (renamed) openmp/device/include/DeviceUtils.h ()
  • (renamed) openmp/device/include/Interface.h ()
  • (renamed) openmp/device/include/LibC.h ()
  • (renamed) openmp/device/include/Mapping.h ()
  • (renamed) openmp/device/include/Profiling.h ()
  • (renamed) openmp/device/include/State.h ()
  • (renamed) openmp/device/include/Synchronization.h ()
  • (renamed) openmp/device/include/Workshare.h ()
  • (renamed) openmp/device/include/generated_microtask_cases.gen ()
  • (renamed) openmp/device/src/Allocator.cpp ()
  • (renamed) openmp/device/src/Configuration.cpp ()
  • (renamed) openmp/device/src/Debug.cpp ()
  • (renamed) openmp/device/src/DeviceUtils.cpp ()
  • (renamed) openmp/device/src/Kernel.cpp ()
  • (renamed) openmp/device/src/LibC.cpp ()
  • (renamed) openmp/device/src/Mapping.cpp ()
  • (renamed) openmp/device/src/Misc.cpp ()
  • (renamed) openmp/device/src/Parallelism.cpp ()
  • (renamed) openmp/device/src/Profiling.cpp ()
  • (renamed) openmp/device/src/Reduction.cpp ()
  • (renamed) openmp/device/src/State.cpp ()
  • (renamed) openmp/device/src/Stub.cpp ()
  • (renamed) openmp/device/src/Synchronization.cpp ()
  • (renamed) openmp/device/src/Tasking.cpp ()
  • (renamed) openmp/device/src/Workshare.cpp ()
  • (modified) openmp/docs/SupportAndFAQ.rst (+7)
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 8646c55060b17..7cc4008ec1f2b 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2794,6 +2794,11 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
   for (const auto &LibPath : HostTC.getFilePaths())
     LibraryPaths.emplace_back(LibPath);
 
+  // Check the target specific library path for the triple as well.
+  SmallString<128> P(D.Dir);
+  llvm::sys::path::append(P, "..", "lib", Triple.getTriple());
+  LibraryPaths.emplace_back(P);
+
   OptSpecifier LibomptargetBCPathOpt =
       Triple.isAMDGCN()  ? options::OPT_libomptarget_amdgpu_bc_path_EQ
       : Triple.isNVPTX() ? options::OPT_libomptarget_nvptx_bc_path_EQ
diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt
index 25c879710645c..70ac6a6d1e6c3 100644
--- a/offload/CMakeLists.txt
+++ b/offload/CMakeLists.txt
@@ -113,6 +113,13 @@ else()
   set(CMAKE_CXX_EXTENSIONS NO)
 endif()
 
+# Emit a warning for people who haven't updated their build.
+if(NOT "openmp" IN_LIST RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES AND
+   NOT "openmp" IN_LIST RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES)
+  message(WARNING "Building the offloading runtime with no device library. See "
+                  "https://openmp.llvm.org//SupportAndFAQ.html for help.")
+endif()
+
 # Set the path of all resulting libraries to a unified location so that it can
 # be used for testing.
 set(LIBOMPTARGET_LIBRARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
@@ -373,7 +380,6 @@ set(LIBOMPTARGET_LLVM_LIBRARY_INTDIR "${LIBOMPTARGET_INTDIR}" CACHE STRING
 
 # Build offloading plugins and device RTLs if they are available.
 add_subdirectory(plugins-nextgen)
-add_subdirectory(DeviceRTL)
 add_subdirectory(tools)
 
 # Build target agnostic offloading library.
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
deleted file mode 100644
index 12f53a30761f3..0000000000000
--- a/offload/DeviceRTL/CMakeLists.txt
+++ /dev/null
@@ -1,181 +0,0 @@
-set(LIBOMPTARGET_BUILD_DEVICERTL_BCLIB TRUE CACHE BOOL
-  "Can be set to false to disable building this library.")
-
-if (NOT LIBOMPTARGET_BUILD_DEVICERTL_BCLIB)
-  message(STATUS "Not building DeviceRTL: Disabled by LIBOMPTARGET_BUILD_DEVICERTL_BCLIB")
-  return()
-endif()
-
-# Check to ensure the host system is a supported host architecture.
-if(NOT ${CMAKE_SIZEOF_VOID_P} EQUAL "8")
-  message(STATUS "Not building DeviceRTL: Runtime does not support 32-bit hosts")
-  return()
-endif()
-
-if (LLVM_DIR)
-  # Builds that use pre-installed LLVM have LLVM_DIR set.
-  # A standalone or LLVM_ENABLE_RUNTIMES=openmp build takes this route
-  find_program(CLANG_TOOL clang PATHS ${LLVM_TOOLS_BINARY_DIR} NO_DEFAULT_PATH)
-elseif (LLVM_TOOL_CLANG_BUILD AND NOT CMAKE_CROSSCOMPILING AND NOT OPENMP_STANDALONE_BUILD)
-  # LLVM in-tree builds may use CMake target names to discover the tools.
-  # A LLVM_ENABLE_PROJECTS=openmp build takes this route
-  set(CLANG_TOOL $<TARGET_FILE:clang>)
-else()
-  message(STATUS "Not building DeviceRTL. No appropriate clang found")
-  return()
-endif()
-
-set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR})
-set(include_directory ${devicertl_base_directory}/include)
-set(source_directory ${devicertl_base_directory}/src)
-
-set(include_files
-  ${include_directory}/Allocator.h
-  ${include_directory}/Configuration.h
-  ${include_directory}/Debug.h
-  ${include_directory}/Interface.h
-  ${include_directory}/LibC.h
-  ${include_directory}/Mapping.h
-  ${include_directory}/Profiling.h
-  ${include_directory}/State.h
-  ${include_directory}/Synchronization.h
-  ${include_directory}/DeviceTypes.h
-  ${include_directory}/DeviceUtils.h
-  ${include_directory}/Workshare.h
-)
-
-set(src_files
-  ${source_directory}/Allocator.cpp
-  ${source_directory}/Configuration.cpp
-  ${source_directory}/Debug.cpp
-  ${source_directory}/Kernel.cpp
-  ${source_directory}/LibC.cpp
-  ${source_directory}/Mapping.cpp
-  ${source_directory}/Misc.cpp
-  ${source_directory}/Parallelism.cpp
-  ${source_directory}/Profiling.cpp
-  ${source_directory}/Reduction.cpp
-  ${source_directory}/State.cpp
-  ${source_directory}/Synchronization.cpp
-  ${source_directory}/Tasking.cpp
-  ${source_directory}/DeviceUtils.cpp
-  ${source_directory}/Workshare.cpp
-)
-
-# We disable the slp vectorizer during the runtime optimization to avoid
-# vectorized accesses to the shared state. Generally, those are "good" but
-# the optimizer pipeline (esp. Attributor) does not fully support vectorized
-# instructions yet and we end up missing out on way more important constant
-# propagation. That said, we will run the vectorizer again after the runtime
-# has been linked into the user program.
-set(clang_opt_flags -O3 -mllvm -openmp-opt-disable -DSHARED_SCRATCHPAD_SIZE=512 -mllvm -vectorize-slp=false )
-
-# If the user built with the GPU C library enabled we will use that instead.
-if(${LIBOMPTARGET_GPU_LIBC_SUPPORT})
-  list(APPEND clang_opt_flags -DOMPTARGET_HAS_LIBC)
-endif()
-
-# Set flags for LLVM Bitcode compilation.
-set(bc_flags -c -flto -std=c++17 -fvisibility=hidden
-             ${clang_opt_flags} -nogpulib -nostdlibinc
-             -fno-rtti -fno-exceptions -fconvergent-functions
-             -Wno-unknown-cuda-version
-             -DOMPTARGET_DEVICE_RUNTIME
-             -I${include_directory}
-             -I${devicertl_base_directory}/../include
-             -I${devicertl_base_directory}/../../libc
-)
-
-# first create an object target
-function(compileDeviceRTLLibrary target_name target_triple)
-  set(target_bc_flags ${ARGN})
-
-  foreach(src ${src_files})
-    get_filename_component(infile ${src} ABSOLUTE)
-    get_filename_component(outfile ${src} NAME)
-    set(outfile "${outfile}-${target_name}.o")
-    set(depfile "${outfile}.d")
-
-    # Passing an empty CPU to -march= suppressed target specific metadata.
-    add_custom_command(OUTPUT ${outfile}
-      COMMAND ${CLANG_TOOL}
-      ${bc_flags}
-      --target=${target_triple}
-      ${target_bc_flags}
-      -MD -MF ${depfile}
-      ${infile} -o ${outfile}
-      DEPENDS ${infile}
-      DEPFILE ${depfile}
-      COMMENT "Building LLVM bitcode ${outfile}"
-      VERBATIM
-    )
-    if(TARGET clang)
-      # Add a file-level dependency to ensure that clang is up-to-date.
-      # By default, add_custom_command only builds clang if the
-      # executable is missing.
-      add_custom_command(OUTPUT ${outfile}
-        DEPENDS clang
-        APPEND
-      )
-    endif()
-    set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})
-
-    list(APPEND obj_files ${CMAKE_CURRENT_BINARY_DIR}/${outfile})
-  endforeach()
-  # Trick to combine these into a bitcode file via the linker's LTO pass. This
-  # is used to provide the legacy `libomptarget-<name>.bc` files. Hack this
-  # through as an executable to get it to use the relocatable link.
-  add_executable(libomptarget-${target_name} ${obj_files})
-  set_target_properties(libomptarget-${target_name} PROPERTIES
-    RUNTIME_OUTPUT_DIRECTORY ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}
-    LINKER_LANGUAGE CXX
-    BUILD_RPATH ""
-    INSTALL_RPATH ""
-    RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
-  target_compile_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}" "-march=")
-  target_link_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}"
-                      "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm" "-march=")
-  install(TARGETS libomptarget-${target_name}
-          PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
-          DESTINATION ${OFFLOAD_INSTALL_LIBDIR})
-
-  add_library(omptarget.${target_name}.all_objs OBJECT IMPORTED)
-  set_property(TARGET omptarget.${target_name}.all_objs APPEND PROPERTY IMPORTED_OBJECTS
-               ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/libomptarget-${target_name}.bc)
-
-  # Archive all the object files generated above into a static library
-  add_library(omptarget.${target_name} STATIC)
-  set_target_properties(omptarget.${target_name} PROPERTIES
-    ARCHIVE_OUTPUT_DIRECTORY "${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/${target_triple}"
-    ARCHIVE_OUTPUT_NAME ompdevice
-    LINKER_LANGUAGE CXX
-  )
-  target_link_libraries(omptarget.${target_name} PRIVATE omptarget.${target_name}.all_objs)
-
-  install(TARGETS omptarget.${target_name}
-          ARCHIVE DESTINATION "lib${LLVM_LIBDIR_SUFFIX}/${target_triple}")
-
-  if (CMAKE_EXPORT_COMPILE_COMMANDS)
-    set(ide_target_name omptarget-ide-${target_name})
-    add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files})
-    target_compile_options(${ide_target_name} PRIVATE
-      -fvisibility=hidden --target=${target_triple}
-      -nogpulib -nostdlibinc -Wno-unknown-cuda-version
-    )
-    target_compile_definitions(${ide_target_name} PRIVATE SHARED_SCRATCHPAD_SIZE=512)
-    target_include_directories(${ide_target_name} PRIVATE
-      ${include_directory}
-      ${devicertl_base_directory}/../../libc
-      ${devicertl_base_directory}/../include
-    )
-    install(TARGETS ${ide_target_name} EXCLUDE_FROM_ALL)
-  endif()
-endfunction()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD)
-  compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none)
-endif()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
-  compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
-endif()
diff --git a/offload/cmake/caches/Offload.cmake b/offload/cmake/caches/Offload.cmake
index 5533a6508f5d5..3747a1d3eb299 100644
--- a/offload/cmake/caches/Offload.cmake
+++ b/offload/cmake/caches/Offload.cmake
@@ -5,5 +5,5 @@ set(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR ON CACHE BOOL "")
 set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda CACHE STRING "") 
 set(RUNTIMES_nvptx64-nvidia-cuda_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/NVPTX.cmake" CACHE STRING "")
 set(RUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/AMDGPU.cmake" CACHE STRING "")
-set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
-set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index c206386fa6b61..c1c533d00f8bb 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -88,6 +88,14 @@ else()
   set(CMAKE_CXX_EXTENSIONS NO)
 endif()
 
+# Targeting the GPU directly requires a few flags to make CMake happy.
+if("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+  set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -nogpulib")
+elseif("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+  set(CMAKE_REQUIRED_FLAGS
+      "${CMAKE_REQUIRED_FLAGS} -flto -c -Wno-unused-command-line-argument")
+endif()
+
 # Check and set up common compiler flags.
 include(config-ix)
 include(HandleOpenMPOptions)
@@ -122,35 +130,41 @@ else()
   get_clang_resource_dir(LIBOMP_HEADERS_INSTALL_PATH SUBDIR include)
 endif()
 
-# Build host runtime library, after LIBOMPTARGET variables are set since they are needed
-# to enable time profiling support in the OpenMP runtime.
-add_subdirectory(runtime)
-
-set(ENABLE_OMPT_TOOLS ON)
-# Currently tools are not tested well on Windows or MacOS X.
-if (APPLE OR WIN32)
-  set(ENABLE_OMPT_TOOLS OFF)
-endif()
-
-option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
-       ${ENABLE_OMPT_TOOLS})
-if (OPENMP_ENABLE_OMPT_TOOLS)
-  add_subdirectory(tools)
-endif()
-
-# Propagate OMPT support to offload
-if(NOT ${OPENMP_STANDALONE_BUILD})
-  set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
-  set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+# Use the current compiler target to determine the appropriate runtime to build.
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR
+   "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx")
+  add_subdirectory(device)
+else()
+  # Build host runtime library, after LIBOMPTARGET variables are set since they
+  # are needed to enable time profiling support in the OpenMP runtime.
+  add_subdirectory(runtime)
+  
+  set(ENABLE_OMPT_TOOLS ON)
+  # Currently tools are not tested well on Windows or MacOS X.
+  if (APPLE OR WIN32)
+    set(ENABLE_OMPT_TOOLS OFF)
+  endif()
+  
+  option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
+         ${ENABLE_OMPT_TOOLS})
+  if (OPENMP_ENABLE_OMPT_TOOLS)
+    add_subdirectory(tools)
+  endif()
+  
+  # Propagate OMPT support to offload
+  if(NOT ${OPENMP_STANDALONE_BUILD})
+    set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
+    set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+  endif()
+  
+  option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
+  
+  # Build libompd.so
+  add_subdirectory(libompd)
+  
+  # Build documentation
+  add_subdirectory(docs)
+  
+  # Now that we have seen all testsuites, create the check-openmp target.
+  construct_check_openmp_target()
 endif()
-
-option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
-
-# Build libompd.so
-add_subdirectory(libompd)
-
-# Build documentation
-add_subdirectory(docs)
-
-# Now that we have seen all testsuites, create the check-openmp target.
-construct_check_openmp_target()
diff --git a/openmp/device/CMakeLists.txt b/openmp/device/CMakeLists.txt
new file mode 100644
index 0000000000000..9211186f4012a
--- /dev/null
+++ b/openmp/device/CMakeLists.txt
@@ -0,0 +1,99 @@
+# Ensure the compiler is a valid clang when building the GPU target.
+set(req_ver "${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}")
+if(LLVM_VERSION_MAJOR AND NOT (CMAKE_CXX_COMPILER_ID MATCHES "[Cc]lang" AND
+   ${CMAKE_CXX_COMPILER_VERSION} VERSION_EQUAL "${req_ver}"))
+  message(FATAL_ERROR "Cannot build GPU device runtime. CMake compiler "
+                      "'${CMAKE_CXX_COMPILER_ID} ${CMAKE_CXX_COMPILER_VERSION}' "
+                      " is not 'Clang ${req_ver}'.")
+endif()
+
+set(src_files
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Allocator.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Configuration.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Debug.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Kernel.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/LibC.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Mapping.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Misc.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Parallelism.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Profiling.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Reduction.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/State.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Synchronization.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Tasking.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/DeviceUtils.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp
+)
+
+list(APPEND compile_options -flto)
+list(APPEND compile_options -fvisibility=hidden)
+list(APPEND compile_options -nogpulib)
+list(APPEND compile_options -nostdlibinc)
+list(APPEND compile_options -fno-rtti)
+list(APPEND compile_options -fno-exceptions)
+list(APPEND compile_options -fconvergent-functions)
+list(APPEND compile_options -Wno-unknown-cuda-version)
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+  list(APPEND compile_options --target=${LLVM_DEFAULT_TARGET_TRIPLE})
+endif()
+
+# We disable the slp vectorizer during the runtime optimization to avoid
+# vectorized accesses to the shared state. Generally, those are "good" but
+# the optimizer pipeline (esp. Attributor) does not fully support vectorized
+# instructions yet and we end up missing out on way more important constant
+# propagation. That said, we will run the vectorizer again after the runtime
+# has been linked into the user program.
+list(APPEND compile_flags "SHELL: -mllvm -vectorize-slp=false")
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn" OR
+   "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+  set(target_name "amdgpu")
+  list(APPEND compile_flags "SHELL:-Xclang -mcode-object-version=none")
+elseif("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^nvptx" OR
+       "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+  set(target_name "nvptx")
+  list(APPEND compile_flags --cuda-feature=+ptx63)
+endif()
+
+# Trick to combine these into a bitcode file via the linker's LTO pass.
+add_executable(libompdevice ${src_files})
+set_target_properties(libompdevice PROPERTIES
+  RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+  LINKER_LANGUAGE CXX
+  BUILD_RPATH ""
+  INSTALL_RPATH ""
+  RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
+
+# If the user built with the GPU C library enabled we will use that instead.
+if(LIBOMPTARGET_GPU_LIBC_SUPPORT)
+  target_compile_definitions(libompdevice PRIVATE OMPTARGET_HAS_LIBC)
+endif()
+target_compile_definitions(libompdevice PRIVATE SHARED_SCRATCHPAD_SIZE=512)
+
+target_include_directories(libompdevice PRIVATE 
+                           ${CMAKE_CURRENT_SOURCE_DIR}/include
+                           ${CMAKE_CURRENT_SOURCE_DIR}/../../libc
+                           ${CMAKE_CURRENT_SOURCE_DIR}/../../offload/include)
+target_compile_options(libompdevice PRIVATE ${compile_options})
+target_link_options(libompdevice PRIVATE
+                    "-flto" "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+  target_link_options(libompdevice PRIVATE "--target=${LLVM_DEFAULT_TARGET_TRIPLE}")
+endif()
+install(TARGETS libompdevice
+        PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
+        DESTINATION ${OPENMP_INSTALL_LIBDIR})
+
+add_library(ompdevice.all_objs OBJECT IMPORTED)
+set_property(TARGET ompdevice.all_objs APPEND PROPERTY IMPORTED_OBJECTS
+             ${CMAKE_CURRENT_BINARY_DIR}/libomptarget-${target_name}.bc)
+
+# Archive all the object files generated above into a static library
+add_library(ompdevice STATIC)
+add_dependencies(ompdevice libompdevice)
+set_target_properties(ompdevice PROPERTIES
+  ARCHIVE_OUTPUT_DIRECTORY "${OPENMP_INSTALL_LIBDIR}"
+  ARCHIVE_OUTPUT_NAME ompdevice
+  LINKER_LANGUAGE CXX
+)
+target_link_libraries(ompdevice PRIVATE ompdevice.all_objs)
+install(TARGETS ompdevice ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")
diff --git a/offload/DeviceRTL/include/Allocator.h b/openmp/device/include/Allocator.h
similarity index 100%
rename from offload/DeviceRTL/include/Allocator.h
rename to openmp/device/include/Allocator.h
diff --git a/offload/DeviceRTL/include/Configuration.h b/openmp/device/include/Configuration.h
similarity index 100%
rename from offload/DeviceRTL/include/Configuration.h
rename to openmp/device/include/Configuration.h
diff --git a/offload/DeviceRTL/include/Debug.h b/openmp/device/include/Debug.h
similarity index 100%
rename from offload/DeviceRTL/include/Debug.h
rename to openmp/device/include/Debug.h
diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/openmp/device/include/DeviceTypes.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceTypes.h
rename to openmp/device/include/DeviceTypes.h
diff --git a/offload/DeviceRTL/include/DeviceUtils.h b/openmp/device/include/DeviceUtils.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceUtils.h
rename to openmp/device/include/DeviceUtils.h
diff --git a/offload/DeviceRTL/include/Interface.h b/openmp/device/include/Interface.h
similarity index 100%
rename from offload/DeviceRTL/include/Interface.h
rename to openmp/device/include/Interface.h
diff --git a/offload/DeviceRTL/include/LibC.h b/openmp/device/include/LibC.h
similarity index 100%
rename from offload/DeviceRTL/include/LibC.h
rename to openmp/device/include/LibC.h
diff --git a/offload/DeviceRTL/include/Mapping.h b/openmp/device/include/Mapping.h
similarity index 100%
rename from offload/DeviceRTL/include/Mapping.h
rename to openmp/device/include/Mapping.h
diff --git a/offload/DeviceRTL/include/Profiling.h b/openmp/device/include/Profiling.h
similarity index 100%
rename from offload/DeviceRTL/include/Profiling.h
rename to openmp/device/include/Profiling.h
diff --git a/offload/DeviceRTL/include/State.h b/openmp/device/include/State.h
similarity index 100%
rename from offload/Dev...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary:
Currently we build the OpenMP device runtime as part of the offload/
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.

This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from offload/ into openmp/ because it
is still the openmp/ runtime and I feel it is more appropriate. We do
want a generic offload/ library at some point, but it would be trivial
to then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.

This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.

    -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda'     \
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \

This also changed where the .bc version of the library lives, but it's
still created.


Patch is 24.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136729.diff

36 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/CommonArgs.cpp (+5)
  • (modified) offload/CMakeLists.txt (+7-1)
  • (removed) offload/DeviceRTL/CMakeLists.txt (-181)
  • (modified) offload/cmake/caches/Offload.cmake (+2-2)
  • (modified) openmp/CMakeLists.txt (+45-31)
  • (added) openmp/device/CMakeLists.txt (+99)
  • (renamed) openmp/device/include/Allocator.h ()
  • (renamed) openmp/device/include/Configuration.h ()
  • (renamed) openmp/device/include/Debug.h ()
  • (renamed) openmp/device/include/DeviceTypes.h ()
  • (renamed) openmp/device/include/DeviceUtils.h ()
  • (renamed) openmp/device/include/Interface.h ()
  • (renamed) openmp/device/include/LibC.h ()
  • (renamed) openmp/device/include/Mapping.h ()
  • (renamed) openmp/device/include/Profiling.h ()
  • (renamed) openmp/device/include/State.h ()
  • (renamed) openmp/device/include/Synchronization.h ()
  • (renamed) openmp/device/include/Workshare.h ()
  • (renamed) openmp/device/include/generated_microtask_cases.gen ()
  • (renamed) openmp/device/src/Allocator.cpp ()
  • (renamed) openmp/device/src/Configuration.cpp ()
  • (renamed) openmp/device/src/Debug.cpp ()
  • (renamed) openmp/device/src/DeviceUtils.cpp ()
  • (renamed) openmp/device/src/Kernel.cpp ()
  • (renamed) openmp/device/src/LibC.cpp ()
  • (renamed) openmp/device/src/Mapping.cpp ()
  • (renamed) openmp/device/src/Misc.cpp ()
  • (renamed) openmp/device/src/Parallelism.cpp ()
  • (renamed) openmp/device/src/Profiling.cpp ()
  • (renamed) openmp/device/src/Reduction.cpp ()
  • (renamed) openmp/device/src/State.cpp ()
  • (renamed) openmp/device/src/Stub.cpp ()
  • (renamed) openmp/device/src/Synchronization.cpp ()
  • (renamed) openmp/device/src/Tasking.cpp ()
  • (renamed) openmp/device/src/Workshare.cpp ()
  • (modified) openmp/docs/SupportAndFAQ.rst (+7)
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 8646c55060b17..7cc4008ec1f2b 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2794,6 +2794,11 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
   for (const auto &LibPath : HostTC.getFilePaths())
     LibraryPaths.emplace_back(LibPath);
 
+  // Check the target specific library path for the triple as well.
+  SmallString<128> P(D.Dir);
+  llvm::sys::path::append(P, "..", "lib", Triple.getTriple());
+  LibraryPaths.emplace_back(P);
+
   OptSpecifier LibomptargetBCPathOpt =
       Triple.isAMDGCN()  ? options::OPT_libomptarget_amdgpu_bc_path_EQ
       : Triple.isNVPTX() ? options::OPT_libomptarget_nvptx_bc_path_EQ
diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt
index 25c879710645c..70ac6a6d1e6c3 100644
--- a/offload/CMakeLists.txt
+++ b/offload/CMakeLists.txt
@@ -113,6 +113,13 @@ else()
   set(CMAKE_CXX_EXTENSIONS NO)
 endif()
 
+# Emit a warning for people who haven't updated their build.
+if(NOT "openmp" IN_LIST RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES AND
+   NOT "openmp" IN_LIST RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES)
+  message(WARNING "Building the offloading runtime with no device library. See "
+                  "https://openmp.llvm.org//SupportAndFAQ.html for help.")
+endif()
+
 # Set the path of all resulting libraries to a unified location so that it can
 # be used for testing.
 set(LIBOMPTARGET_LIBRARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
@@ -373,7 +380,6 @@ set(LIBOMPTARGET_LLVM_LIBRARY_INTDIR "${LIBOMPTARGET_INTDIR}" CACHE STRING
 
 # Build offloading plugins and device RTLs if they are available.
 add_subdirectory(plugins-nextgen)
-add_subdirectory(DeviceRTL)
 add_subdirectory(tools)
 
 # Build target agnostic offloading library.
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
deleted file mode 100644
index 12f53a30761f3..0000000000000
--- a/offload/DeviceRTL/CMakeLists.txt
+++ /dev/null
@@ -1,181 +0,0 @@
-set(LIBOMPTARGET_BUILD_DEVICERTL_BCLIB TRUE CACHE BOOL
-  "Can be set to false to disable building this library.")
-
-if (NOT LIBOMPTARGET_BUILD_DEVICERTL_BCLIB)
-  message(STATUS "Not building DeviceRTL: Disabled by LIBOMPTARGET_BUILD_DEVICERTL_BCLIB")
-  return()
-endif()
-
-# Check to ensure the host system is a supported host architecture.
-if(NOT ${CMAKE_SIZEOF_VOID_P} EQUAL "8")
-  message(STATUS "Not building DeviceRTL: Runtime does not support 32-bit hosts")
-  return()
-endif()
-
-if (LLVM_DIR)
-  # Builds that use pre-installed LLVM have LLVM_DIR set.
-  # A standalone or LLVM_ENABLE_RUNTIMES=openmp build takes this route
-  find_program(CLANG_TOOL clang PATHS ${LLVM_TOOLS_BINARY_DIR} NO_DEFAULT_PATH)
-elseif (LLVM_TOOL_CLANG_BUILD AND NOT CMAKE_CROSSCOMPILING AND NOT OPENMP_STANDALONE_BUILD)
-  # LLVM in-tree builds may use CMake target names to discover the tools.
-  # A LLVM_ENABLE_PROJECTS=openmp build takes this route
-  set(CLANG_TOOL $<TARGET_FILE:clang>)
-else()
-  message(STATUS "Not building DeviceRTL. No appropriate clang found")
-  return()
-endif()
-
-set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR})
-set(include_directory ${devicertl_base_directory}/include)
-set(source_directory ${devicertl_base_directory}/src)
-
-set(include_files
-  ${include_directory}/Allocator.h
-  ${include_directory}/Configuration.h
-  ${include_directory}/Debug.h
-  ${include_directory}/Interface.h
-  ${include_directory}/LibC.h
-  ${include_directory}/Mapping.h
-  ${include_directory}/Profiling.h
-  ${include_directory}/State.h
-  ${include_directory}/Synchronization.h
-  ${include_directory}/DeviceTypes.h
-  ${include_directory}/DeviceUtils.h
-  ${include_directory}/Workshare.h
-)
-
-set(src_files
-  ${source_directory}/Allocator.cpp
-  ${source_directory}/Configuration.cpp
-  ${source_directory}/Debug.cpp
-  ${source_directory}/Kernel.cpp
-  ${source_directory}/LibC.cpp
-  ${source_directory}/Mapping.cpp
-  ${source_directory}/Misc.cpp
-  ${source_directory}/Parallelism.cpp
-  ${source_directory}/Profiling.cpp
-  ${source_directory}/Reduction.cpp
-  ${source_directory}/State.cpp
-  ${source_directory}/Synchronization.cpp
-  ${source_directory}/Tasking.cpp
-  ${source_directory}/DeviceUtils.cpp
-  ${source_directory}/Workshare.cpp
-)
-
-# We disable the slp vectorizer during the runtime optimization to avoid
-# vectorized accesses to the shared state. Generally, those are "good" but
-# the optimizer pipeline (esp. Attributor) does not fully support vectorized
-# instructions yet and we end up missing out on way more important constant
-# propagation. That said, we will run the vectorizer again after the runtime
-# has been linked into the user program.
-set(clang_opt_flags -O3 -mllvm -openmp-opt-disable -DSHARED_SCRATCHPAD_SIZE=512 -mllvm -vectorize-slp=false )
-
-# If the user built with the GPU C library enabled we will use that instead.
-if(${LIBOMPTARGET_GPU_LIBC_SUPPORT})
-  list(APPEND clang_opt_flags -DOMPTARGET_HAS_LIBC)
-endif()
-
-# Set flags for LLVM Bitcode compilation.
-set(bc_flags -c -flto -std=c++17 -fvisibility=hidden
-             ${clang_opt_flags} -nogpulib -nostdlibinc
-             -fno-rtti -fno-exceptions -fconvergent-functions
-             -Wno-unknown-cuda-version
-             -DOMPTARGET_DEVICE_RUNTIME
-             -I${include_directory}
-             -I${devicertl_base_directory}/../include
-             -I${devicertl_base_directory}/../../libc
-)
-
-# first create an object target
-function(compileDeviceRTLLibrary target_name target_triple)
-  set(target_bc_flags ${ARGN})
-
-  foreach(src ${src_files})
-    get_filename_component(infile ${src} ABSOLUTE)
-    get_filename_component(outfile ${src} NAME)
-    set(outfile "${outfile}-${target_name}.o")
-    set(depfile "${outfile}.d")
-
-    # Passing an empty CPU to -march= suppressed target specific metadata.
-    add_custom_command(OUTPUT ${outfile}
-      COMMAND ${CLANG_TOOL}
-      ${bc_flags}
-      --target=${target_triple}
-      ${target_bc_flags}
-      -MD -MF ${depfile}
-      ${infile} -o ${outfile}
-      DEPENDS ${infile}
-      DEPFILE ${depfile}
-      COMMENT "Building LLVM bitcode ${outfile}"
-      VERBATIM
-    )
-    if(TARGET clang)
-      # Add a file-level dependency to ensure that clang is up-to-date.
-      # By default, add_custom_command only builds clang if the
-      # executable is missing.
-      add_custom_command(OUTPUT ${outfile}
-        DEPENDS clang
-        APPEND
-      )
-    endif()
-    set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})
-
-    list(APPEND obj_files ${CMAKE_CURRENT_BINARY_DIR}/${outfile})
-  endforeach()
-  # Trick to combine these into a bitcode file via the linker's LTO pass. This
-  # is used to provide the legacy `libomptarget-<name>.bc` files. Hack this
-  # through as an executable to get it to use the relocatable link.
-  add_executable(libomptarget-${target_name} ${obj_files})
-  set_target_properties(libomptarget-${target_name} PROPERTIES
-    RUNTIME_OUTPUT_DIRECTORY ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}
-    LINKER_LANGUAGE CXX
-    BUILD_RPATH ""
-    INSTALL_RPATH ""
-    RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
-  target_compile_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}" "-march=")
-  target_link_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}"
-                      "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm" "-march=")
-  install(TARGETS libomptarget-${target_name}
-          PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
-          DESTINATION ${OFFLOAD_INSTALL_LIBDIR})
-
-  add_library(omptarget.${target_name}.all_objs OBJECT IMPORTED)
-  set_property(TARGET omptarget.${target_name}.all_objs APPEND PROPERTY IMPORTED_OBJECTS
-               ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/libomptarget-${target_name}.bc)
-
-  # Archive all the object files generated above into a static library
-  add_library(omptarget.${target_name} STATIC)
-  set_target_properties(omptarget.${target_name} PROPERTIES
-    ARCHIVE_OUTPUT_DIRECTORY "${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/${target_triple}"
-    ARCHIVE_OUTPUT_NAME ompdevice
-    LINKER_LANGUAGE CXX
-  )
-  target_link_libraries(omptarget.${target_name} PRIVATE omptarget.${target_name}.all_objs)
-
-  install(TARGETS omptarget.${target_name}
-          ARCHIVE DESTINATION "lib${LLVM_LIBDIR_SUFFIX}/${target_triple}")
-
-  if (CMAKE_EXPORT_COMPILE_COMMANDS)
-    set(ide_target_name omptarget-ide-${target_name})
-    add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files})
-    target_compile_options(${ide_target_name} PRIVATE
-      -fvisibility=hidden --target=${target_triple}
-      -nogpulib -nostdlibinc -Wno-unknown-cuda-version
-    )
-    target_compile_definitions(${ide_target_name} PRIVATE SHARED_SCRATCHPAD_SIZE=512)
-    target_include_directories(${ide_target_name} PRIVATE
-      ${include_directory}
-      ${devicertl_base_directory}/../../libc
-      ${devicertl_base_directory}/../include
-    )
-    install(TARGETS ${ide_target_name} EXCLUDE_FROM_ALL)
-  endif()
-endfunction()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD)
-  compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none)
-endif()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
-  compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
-endif()
diff --git a/offload/cmake/caches/Offload.cmake b/offload/cmake/caches/Offload.cmake
index 5533a6508f5d5..3747a1d3eb299 100644
--- a/offload/cmake/caches/Offload.cmake
+++ b/offload/cmake/caches/Offload.cmake
@@ -5,5 +5,5 @@ set(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR ON CACHE BOOL "")
 set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda CACHE STRING "") 
 set(RUNTIMES_nvptx64-nvidia-cuda_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/NVPTX.cmake" CACHE STRING "")
 set(RUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/AMDGPU.cmake" CACHE STRING "")
-set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
-set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index c206386fa6b61..c1c533d00f8bb 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -88,6 +88,14 @@ else()
   set(CMAKE_CXX_EXTENSIONS NO)
 endif()
 
+# Targeting the GPU directly requires a few flags to make CMake happy.
+if("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+  set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -nogpulib")
+elseif("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+  set(CMAKE_REQUIRED_FLAGS
+      "${CMAKE_REQUIRED_FLAGS} -flto -c -Wno-unused-command-line-argument")
+endif()
+
 # Check and set up common compiler flags.
 include(config-ix)
 include(HandleOpenMPOptions)
@@ -122,35 +130,41 @@ else()
   get_clang_resource_dir(LIBOMP_HEADERS_INSTALL_PATH SUBDIR include)
 endif()
 
-# Build host runtime library, after LIBOMPTARGET variables are set since they are needed
-# to enable time profiling support in the OpenMP runtime.
-add_subdirectory(runtime)
-
-set(ENABLE_OMPT_TOOLS ON)
-# Currently tools are not tested well on Windows or MacOS X.
-if (APPLE OR WIN32)
-  set(ENABLE_OMPT_TOOLS OFF)
-endif()
-
-option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
-       ${ENABLE_OMPT_TOOLS})
-if (OPENMP_ENABLE_OMPT_TOOLS)
-  add_subdirectory(tools)
-endif()
-
-# Propagate OMPT support to offload
-if(NOT ${OPENMP_STANDALONE_BUILD})
-  set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
-  set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+# Use the current compiler target to determine the appropriate runtime to build.
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR
+   "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx")
+  add_subdirectory(device)
+else()
+  # Build host runtime library, after LIBOMPTARGET variables are set since they
+  # are needed to enable time profiling support in the OpenMP runtime.
+  add_subdirectory(runtime)
+  
+  set(ENABLE_OMPT_TOOLS ON)
+  # Currently tools are not tested well on Windows or MacOS X.
+  if (APPLE OR WIN32)
+    set(ENABLE_OMPT_TOOLS OFF)
+  endif()
+  
+  option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
+         ${ENABLE_OMPT_TOOLS})
+  if (OPENMP_ENABLE_OMPT_TOOLS)
+    add_subdirectory(tools)
+  endif()
+  
+  # Propagate OMPT support to offload
+  if(NOT ${OPENMP_STANDALONE_BUILD})
+    set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
+    set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+  endif()
+  
+  option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
+  
+  # Build libompd.so
+  add_subdirectory(libompd)
+  
+  # Build documentation
+  add_subdirectory(docs)
+  
+  # Now that we have seen all testsuites, create the check-openmp target.
+  construct_check_openmp_target()
 endif()
-
-option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
-
-# Build libompd.so
-add_subdirectory(libompd)
-
-# Build documentation
-add_subdirectory(docs)
-
-# Now that we have seen all testsuites, create the check-openmp target.
-construct_check_openmp_target()
diff --git a/openmp/device/CMakeLists.txt b/openmp/device/CMakeLists.txt
new file mode 100644
index 0000000000000..9211186f4012a
--- /dev/null
+++ b/openmp/device/CMakeLists.txt
@@ -0,0 +1,99 @@
+# Ensure the compiler is a valid clang when building the GPU target.
+set(req_ver "${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}")
+if(LLVM_VERSION_MAJOR AND NOT (CMAKE_CXX_COMPILER_ID MATCHES "[Cc]lang" AND
+   ${CMAKE_CXX_COMPILER_VERSION} VERSION_EQUAL "${req_ver}"))
+  message(FATAL_ERROR "Cannot build GPU device runtime. CMake compiler "
+                      "'${CMAKE_CXX_COMPILER_ID} ${CMAKE_CXX_COMPILER_VERSION}' "
+                      " is not 'Clang ${req_ver}'.")
+endif()
+
+set(src_files
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Allocator.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Configuration.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Debug.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Kernel.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/LibC.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Mapping.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Misc.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Parallelism.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Profiling.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Reduction.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/State.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Synchronization.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Tasking.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/DeviceUtils.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp
+)
+
+list(APPEND compile_options -flto)
+list(APPEND compile_options -fvisibility=hidden)
+list(APPEND compile_options -nogpulib)
+list(APPEND compile_options -nostdlibinc)
+list(APPEND compile_options -fno-rtti)
+list(APPEND compile_options -fno-exceptions)
+list(APPEND compile_options -fconvergent-functions)
+list(APPEND compile_options -Wno-unknown-cuda-version)
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+  list(APPEND compile_options --target=${LLVM_DEFAULT_TARGET_TRIPLE})
+endif()
+
+# We disable the slp vectorizer during the runtime optimization to avoid
+# vectorized accesses to the shared state. Generally, those are "good" but
+# the optimizer pipeline (esp. Attributor) does not fully support vectorized
+# instructions yet and we end up missing out on way more important constant
+# propagation. That said, we will run the vectorizer again after the runtime
+# has been linked into the user program.
+list(APPEND compile_flags "SHELL: -mllvm -vectorize-slp=false")
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn" OR
+   "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+  set(target_name "amdgpu")
+  list(APPEND compile_flags "SHELL:-Xclang -mcode-object-version=none")
+elseif("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^nvptx" OR
+       "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+  set(target_name "nvptx")
+  list(APPEND compile_flags --cuda-feature=+ptx63)
+endif()
+
+# Trick to combine these into a bitcode file via the linker's LTO pass.
+add_executable(libompdevice ${src_files})
+set_target_properties(libompdevice PROPERTIES
+  RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+  LINKER_LANGUAGE CXX
+  BUILD_RPATH ""
+  INSTALL_RPATH ""
+  RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
+
+# If the user built with the GPU C library enabled we will use that instead.
+if(LIBOMPTARGET_GPU_LIBC_SUPPORT)
+  target_compile_definitions(libompdevice PRIVATE OMPTARGET_HAS_LIBC)
+endif()
+target_compile_definitions(libompdevice PRIVATE SHARED_SCRATCHPAD_SIZE=512)
+
+target_include_directories(libompdevice PRIVATE 
+                           ${CMAKE_CURRENT_SOURCE_DIR}/include
+                           ${CMAKE_CURRENT_SOURCE_DIR}/../../libc
+                           ${CMAKE_CURRENT_SOURCE_DIR}/../../offload/include)
+target_compile_options(libompdevice PRIVATE ${compile_options})
+target_link_options(libompdevice PRIVATE
+                    "-flto" "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+  target_link_options(libompdevice PRIVATE "--target=${LLVM_DEFAULT_TARGET_TRIPLE}")
+endif()
+install(TARGETS libompdevice
+        PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
+        DESTINATION ${OPENMP_INSTALL_LIBDIR})
+
+add_library(ompdevice.all_objs OBJECT IMPORTED)
+set_property(TARGET ompdevice.all_objs APPEND PROPERTY IMPORTED_OBJECTS
+             ${CMAKE_CURRENT_BINARY_DIR}/libomptarget-${target_name}.bc)
+
+# Archive all the object files generated above into a static library
+add_library(ompdevice STATIC)
+add_dependencies(ompdevice libompdevice)
+set_target_properties(ompdevice PROPERTIES
+  ARCHIVE_OUTPUT_DIRECTORY "${OPENMP_INSTALL_LIBDIR}"
+  ARCHIVE_OUTPUT_NAME ompdevice
+  LINKER_LANGUAGE CXX
+)
+target_link_libraries(ompdevice PRIVATE ompdevice.all_objs)
+install(TARGETS ompdevice ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")
diff --git a/offload/DeviceRTL/include/Allocator.h b/openmp/device/include/Allocator.h
similarity index 100%
rename from offload/DeviceRTL/include/Allocator.h
rename to openmp/device/include/Allocator.h
diff --git a/offload/DeviceRTL/include/Configuration.h b/openmp/device/include/Configuration.h
similarity index 100%
rename from offload/DeviceRTL/include/Configuration.h
rename to openmp/device/include/Configuration.h
diff --git a/offload/DeviceRTL/include/Debug.h b/openmp/device/include/Debug.h
similarity index 100%
rename from offload/DeviceRTL/include/Debug.h
rename to openmp/device/include/Debug.h
diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/openmp/device/include/DeviceTypes.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceTypes.h
rename to openmp/device/include/DeviceTypes.h
diff --git a/offload/DeviceRTL/include/DeviceUtils.h b/openmp/device/include/DeviceUtils.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceUtils.h
rename to openmp/device/include/DeviceUtils.h
diff --git a/offload/DeviceRTL/include/Interface.h b/openmp/device/include/Interface.h
similarity index 100%
rename from offload/DeviceRTL/include/Interface.h
rename to openmp/device/include/Interface.h
diff --git a/offload/DeviceRTL/include/LibC.h b/openmp/device/include/LibC.h
similarity index 100%
rename from offload/DeviceRTL/include/LibC.h
rename to openmp/device/include/LibC.h
diff --git a/offload/DeviceRTL/include/Mapping.h b/openmp/device/include/Mapping.h
similarity index 100%
rename from offload/DeviceRTL/include/Mapping.h
rename to openmp/device/include/Mapping.h
diff --git a/offload/DeviceRTL/include/Profiling.h b/openmp/device/include/Profiling.h
similarity index 100%
rename from offload/DeviceRTL/include/Profiling.h
rename to openmp/device/include/Profiling.h
diff --git a/offload/DeviceRTL/include/State.h b/openmp/device/include/State.h
similarity index 100%
rename from offload/Dev...
[truncated]

@jhuber6 jhuber6 force-pushed the OpenMPGPURuntime branch 2 times, most recently from ee6ca95 to 748a7f7 Compare April 22, 2025 17:54
jhuber6 added a commit to jhuber6/llvm-project that referenced this pull request Apr 22, 2025
Summary:
This was accidentally kept in the old location when we moved to the
new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the
delta with llvm#136729.
@jhuber6 jhuber6 requested a review from Meinersbur April 22, 2025 19:59
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the LLVM_ENABLE_RUNTIMES-machanism is a great idea.
Regarding the move back to openmp/device, I don't really have an opinion. However, there are some arguments to make:

  1. The same arguments apply to libomptarget as well
  2. Definitions such as those Interface.h are indeed OpenMP-only
  3. Some defintions could be useful for other languages as well, such as Synchronization.h. However, they are also in the ompx namespace

Comment on lines +134 to +136
if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR
"${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx")
add_subdirectory(device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[serious] What happens with host offloading? They also need device-like functions such as omp_get_device_num(). The device-side implementation and host-side implementation are different. This also matter when e.g. offloading to a remote cluster (non-GPU) node via MPI.

I don't think we should (or can) assume that the triple determines whether it is executing on the host or device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host offloading uses 'libomp.so'. The way I think about it is that this 'ompdeviceis basicallylibomp` for GPUs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device-side omp_get_device_num() (defined in libomptarget.so, not libomp.so) only returns omp_get_initial_device(), which is wrong for any kind of offloading.

After trying out what actuall happens I found that it actually executes the Fortran wrapper (in libomp.so). It also incorrectly assumes it is always executing on the host. That looks like a bug.

@mgorny
Copy link
Member

mgorny commented Apr 23, 2025

Honestly, I am thoroughly confused about all that openmp ↔ offload moving. But if these don't share much code with the current openmp, perhaps the cleanest approach would be to make it entirely separate?

jhuber6 added a commit that referenced this pull request Apr 23, 2025
Summary:
This was accidentally kept in the old location when we moved to the
new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the
delta with #136729.
@jhuber6
Copy link
Contributor Author

jhuber6 commented Apr 23, 2025

I think using the LLVM_ENABLE_RUNTIMES-machanism is a great idea. Regarding the move back to openmp/device, I don't really have an opinion. However, there are some arguments to make:

1. The same arguments apply to `libomptarget` as well

2. Definitions such as those `Interface.h` are indeed OpenMP-only

3. Some defintions could be useful for other languages as well, such as `Synchronization.h`. However, they are also in the `ompx` namespace

Yes, I strongly believe that libomptarget should eventually be moved back into openmp/. Long term I think offload/ should contain the generic 'plugins' that provide an API for offloading to various GPUs. libomptarget then becomes the OpenMP runtime using that interface. There are arguments about some things in the current runtime are generically useful, but my assertion is that these should just be put in a separate library in offload/ if that's the case. Combining everything into a single library is a holdover from before we had the appropriate infrastructure to easily create these, now it's trivial to just make a liboffload.a for the GPU.

Honestly, I am thoroughly confused about all that openmp ↔ offload moving. But if these don't share much code with the current openmp, perhaps the cleanest approach would be to make it entirely separate?

Yeah, it's a little confusing because right now offload/ has a direct dependency on openmp so they're effectively the same project.

@jhuber6 jhuber6 force-pushed the OpenMPGPURuntime branch 2 times, most recently from d8eeb33 to 145b566 Compare April 23, 2025 13:27
@jhuber6
Copy link
Contributor Author

jhuber6 commented Apr 28, 2025

As I understand, we already have a pretty strong tendency toward the former. We have right now flang-rt, compiler-rt, libclc, and openmp.

My understanding (which might be incorrect), is that flang-rt and compiler-ft are host-only libraries, libclc is device-only, and openmp has both host and device components with the location of the device-only component being the crux of this discussion. A policy of using top level directories for host-only RTs and the host portions of RTs that span host/device, and placing device-only RT libraries under offload makes sense to me. However...

I don't really like to make a distinction between 'host' and 'device' here. As shown by the libc project, we should be able to treat the GPU as just another target. OpenMP is a little special here because it does enforce different semantics on the host vs. device, but everything else is just some flavor of compiling some utility functions for that target. Wasn't OpenCL designed with execution on CPUs in mind as well? It's probably easier to think of just having some utility library that works correctly w/ cross-compiling.

jhuber6 added a commit to jhuber6/llvm-project that referenced this pull request May 2, 2025
Summary:
Another hacky fix done until
llvm#136729 lands. This time for
`-mcpu`.
@jdoerfert
Copy link
Member

So, I think it should go back in openmp/ as with libomptarget. That makes offload/ a generic interface that languages inherit from to make their own language runtimes, which I think is how most people expect offload/ to work.

I find this argument compelling as well.

Perhaps it would make sense to keep offload generic and minimal and to co-locate the device RTs under a top level device-rt directory that contains openmp, openacc, cuda, etc...

This addresses one of my main concerns: spreading device runtimes all over the place or introducing N new top-level folders. I don't think we want either, but keeping the device code together in a new top-level device-rt directory is, for me, almost as good as having that device-rt folder live under offload. I don't see the benefit of it not being in offload, at least until we have device runtimes that work without offload, or at least have plans to have them. Moving it to openmp will open up the question of where to put the rest, hence my conceptual objection to it. Not to mention that device runtimes have more connection to one another, and to the offload infrastructure, than to their host runtime, at least for now. (Again, there is nothing in DeviceRTL.openmp.a that connects to the openmp folder/host code but, for now, various things that connect to the offload folder/host code.)

@jdoerfert
Copy link
Member

FWIW, this PR contains two conceptual changes, and my objection + comments have all been targeting one of them: the code move.
Wrt. the second change, I support building the device runtimes per triple in a way that aligns more with cross-compiling other runtimes. I understand from @jhuber6 that he bundled them to avoid two cmake changes if both are merged, but that bundling is what, for now, stalls the second part.

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 6, 2025
…h (#136754)

Summary:
This was accidentally kept in the old location when we moved to the
new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the
delta with llvm/llvm-project#136729.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 6, 2025
Summary:
Override the default linker in case the user is passing it separately.
This requires `lld` but it always did. This will be fixed *properly*
when llvm/llvm-project#136729 lands.

Fixes llvm/llvm-project#136822
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
)

Summary:
This was accidentally kept in the old location when we moved to the
new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the
delta with llvm#136729.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Summary:
Override the default linker in case the user is passing it separately.
This requires `lld` but it always did. This will be fixed *properly*
when llvm#136729 lands.

Fixes llvm#136822
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
)

Summary:
This was accidentally kept in the old location when we moved to the
new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the
delta with llvm#136729.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Summary:
Override the default linker in case the user is passing it separately.
This requires `lld` but it always did. This will be fixed *properly*
when llvm#136729 lands.

Fixes llvm#136822
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
)

Summary:
This was accidentally kept in the old location when we moved to the
new `lib/<triple>/` location for the DeviceRTL. Move this to reduce the
delta with llvm#136729.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Summary:
Override the default linker in case the user is passing it separately.
This requires `lld` but it always did. This will be fixed *properly*
when llvm#136729 lands.

Fixes llvm#136822
jhuber6 added a commit that referenced this pull request May 6, 2025
Summary:
Another hacky fix done until
#136729 lands. This time for
`-mcpu`.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 7, 2025
Summary:
Another hacky fix done until
llvm/llvm-project#136729 lands. This time for
`-mcpu`.
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
Summary:
Another hacky fix done until
llvm#136729 lands. This time for
`-mcpu`.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
Summary:
Override the default linker in case the user is passing it separately.
This requires `lld` but it always did. This will be fixed *properly*
when llvm#136729 lands.

Fixes llvm#136822
@jhuber6 jhuber6 force-pushed the OpenMPGPURuntime branch from 145b566 to e94fe4a Compare July 31, 2025 14:59
Copy link

github-actions bot commented Jul 31, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Summary:
Currently we build the OpenMP device runtime as part of the `offload/`
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.

This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from `offload/` into `openmp/` because it
is still the `openmp/` runtime and I feel it is more appropriate. We do
want a generic `offload/` library at some point, but it would be trivial
to then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.

This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.

```
    -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda'     \
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \
```

This also changed where the `.bc` version of the library lives, but it's
still created.
@jhuber6 jhuber6 force-pushed the OpenMPGPURuntime branch from e94fe4a to 461bdc1 Compare July 31, 2025 15:02
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 31, 2025

Were there any other concerns about this? I'm hopefully going to be able to move forward now that the SYCL team made it clear that they'd prefer separate directories for the offloading languages.

@mgorny
Copy link
Member

mgorny commented Aug 1, 2025

Well, I still haven't switched the Gentoo OpenMP build from direct standalone to runtimes, but I'll be doing a fresh snapshot tomorrow, so hopefully I'll try that.

@mgorny
Copy link
Member

mgorny commented Aug 2, 2025

Well, after wasting an hour on this, I've only rediscovered that I've already tried it over a year ago, and unsurpisingly, runtimes build is still completely broken since it still adds -nostdlib++ to the C compiler: #90332.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 2, 2025

Well, after wasting an hour on this, I've only rediscovered that I've already tried it over a year ago, and unsurpisingly, runtimes build is still completely broken since it still adds -nostdlib++ to the C compiler: #90332.

Is this for libomp? This only affects the GPU build. That bug you linked sounds weird, I'm guessing we add that to the required flags for flag detection?

@mgorny
Copy link
Member

mgorny commented Aug 2, 2025

Is this for libomp?

Yes, -DLLVM_ENABLE_RUNTIMES=openmp.

This only affects the GPU build. That bug you linked sounds weird, I'm guessing we add that to the required flags for flag detection?

I guess so. It's pretty clear to me that adding C++-specific flags to CMAKE_REQUIRED_FLAGS is wrong, but I presume that there is a reason that the code didn't use CMAKE_CXX_FLAGS instead.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 2, 2025

Is this for libomp?

Yes, -DLLVM_ENABLE_RUNTIMES=openmp.

This only affects the GPU build. That bug you linked sounds weird, I'm guessing we add that to the required flags for flag detection?

I guess so. It's pretty clear to me that adding C++-specific flags to CMAKE_REQUIRED_FLAGS is wrong, but I presume that there is a reason that the code didn't use CMAKE_CXX_FLAGS instead.

CMake's handling of compiler flag checks is very unfortunate. It just uses CMAKE_REQUIRED_FLAGS as global state and passes it to cxx ${CMAKE_REQUIRED_FLAGS} ${FLAG_TO_CHECK} which runs both the compile step and link step. I wish there were a more customizable way to pass flags on a per-check.

@mgorny
Copy link
Member

mgorny commented Aug 3, 2025

I wonder if we could change that logic to only add these flags if checks actually fail, i.e. presumably when we are missing the standard C++ library. We could even go as far as to:

  1. Try a check without the flags.
  2. If it fails, try adding the flags and try again.
  3. If it still fails, error out instead of trying to proceed with broken check results.

@mgorny
Copy link
Member

mgorny commented Aug 4, 2025

I've filed #151930 as a possible workaround. If that is merged and doesn't cause any regressions, I can look at the other issues — such as broken search for test dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category offload openmp:libomp OpenMP host runtime openmp:libomptarget OpenMP offload runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants