[Clang] Permit `-Xarch_` to be used with `--offload-arch` #131884

jhuber6 · 2025-03-18T18:58:06Z

Summary:
The --offload-arch option is very complicated, but roughly behaves as
the -march option for several compilations at once. This creates
problems when we try to combine multiple separate architectures into
one, as happens with SYCL, OpenMP, and HIP w/ SPIR-V.

The existing solution used by OpenMP is the -Xopenmp-target option,
this lets you select which --offload-arch options go to which
toolchain. This patch permits -Xarch_ to be used in the same way.

There are concerns about whether or not this falls into the -Xarch_
umbrella because it changes the driver behaviour, but I think this is the
easiest way to handle this problem. The existing solution seems to be
prefixing things and adding more magic handling into --offload-arch.
Like SYCL is doing nvidia_gpu_sm_89 instead of just -Xarch_nvptx64 --offload-arch=sm_89.

The only reason this is more complicated than just doing -Xarch_sm_89 -march=... is because we need to know to create multiple jobs for each
architecture.

Summary: The `--offload-arch` option is very complicated, but roughly behaves as the `-march` option for several compilations at once. This creates problems when we try to compbine multiple separate architectures into one, as happens with SYCL, OpenMP, and HIP w/ SPIR-V. The existing solution used by OpenMP is the `-Xopenmp-target` option, this lets you select which `--offload-arch` options go to which toolchain. This patch premits `-Xarch_` to be used in the same way. There are concerns about whether or not this falls into the `-Xarch_` umbrella because it changes the driver behavior, but I think this is the easiest way to handle this problem. The existing solutions seems to be prefixing things and adding more magic handling into `--offload-arch`. Like SPIRV is doing `nvidia_gpu_sm_89` instead of just `-Xarch_nvptx64 --offload-arch=sm_89`. The only reason this is more complicated than just doing `-Xarch_sm_89 -march=...` is because we need to know to create multiple jobs for each architecture.

llvmbot · 2025-03-18T19:01:04Z

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
The --offload-arch option is very complicated, but roughly behaves as
the -march option for several compilations at once. This creates
problems when we try to compbine multiple separate architectures into
one, as happens with SYCL, OpenMP, and HIP w/ SPIR-V.

The existing solution used by OpenMP is the -Xopenmp-target option,
this lets you select which --offload-arch options go to which
toolchain. This patch premits -Xarch_ to be used in the same way.

There are concerns about whether or not this falls into the -Xarch_
umbrella because it changes the driver behavior, but I think this is the
easiest way to handle this problem. The existing solutions seems to be
prefixing things and adding more magic handling into --offload-arch.
Like SPIRV is doing nvidia_gpu_sm_89 instead of just -Xarch_nvptx64 --offload-arch=sm_89.

The only reason this is more complicated than just doing -Xarch_sm_89 -march=... is because we need to know to create multiple jobs for each
architecture.

Full diff: https://github.com/llvm/llvm-project/pull/131884.diff

2 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+1-2)
(modified) clang/test/Driver/offload-Xarch.c (+4)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 66ae8f1c7f064..05fc6aaa266b5 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1129,13 +1129,12 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">,
 // Common offloading options
 let Group = offload_Group in {
 def offload_arch_EQ : Joined<["--"], "offload-arch=">,
-  Visibility<[ClangOption, FlangOption]>, Flags<[NoXarchOption]>,
+  Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). "
            "If 'native' is used the compiler will detect locally installed architectures. "
            "For HIP offloading, the device architecture can be followed by target ID features "
            "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. "
            "'all' resets the list to its default value.">;
diff --git a/clang/test/Driver/offload-Xarch.c b/clang/test/Driver/offload-Xarch.c
index 8856dac198465..8106dcfcd1354 100644
--- a/clang/test/Driver/offload-Xarch.c
+++ b/clang/test/Driver/offload-Xarch.c
@@ -14,6 +14,10 @@
 // RUN:   --target=x86_64-unknown-linux-gnu -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_60 -nogpuinc \
 // RUN:   -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
 // RUN: | FileCheck -check-prefix=OPENMP %s
+// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
 
 // OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]"
 // OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"

llvmbot · 2025-03-18T19:01:05Z

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary:
The --offload-arch option is very complicated, but roughly behaves as
the -march option for several compilations at once. This creates
problems when we try to compbine multiple separate architectures into
one, as happens with SYCL, OpenMP, and HIP w/ SPIR-V.

The existing solution used by OpenMP is the -Xopenmp-target option,
this lets you select which --offload-arch options go to which
toolchain. This patch premits -Xarch_ to be used in the same way.

There are concerns about whether or not this falls into the -Xarch_
umbrella because it changes the driver behavior, but I think this is the
easiest way to handle this problem. The existing solutions seems to be
prefixing things and adding more magic handling into --offload-arch.
Like SPIRV is doing nvidia_gpu_sm_89 instead of just -Xarch_nvptx64 --offload-arch=sm_89.

The only reason this is more complicated than just doing -Xarch_sm_89 -march=... is because we need to know to create multiple jobs for each
architecture.

Full diff: https://github.com/llvm/llvm-project/pull/131884.diff

2 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+1-2)
(modified) clang/test/Driver/offload-Xarch.c (+4)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 66ae8f1c7f064..05fc6aaa266b5 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1129,13 +1129,12 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">,
 // Common offloading options
 let Group = offload_Group in {
 def offload_arch_EQ : Joined<["--"], "offload-arch=">,
-  Visibility<[ClangOption, FlangOption]>, Flags<[NoXarchOption]>,
+  Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). "
            "If 'native' is used the compiler will detect locally installed architectures. "
            "For HIP offloading, the device architecture can be followed by target ID features "
            "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. "
            "'all' resets the list to its default value.">;
diff --git a/clang/test/Driver/offload-Xarch.c b/clang/test/Driver/offload-Xarch.c
index 8856dac198465..8106dcfcd1354 100644
--- a/clang/test/Driver/offload-Xarch.c
+++ b/clang/test/Driver/offload-Xarch.c
@@ -14,6 +14,10 @@
 // RUN:   --target=x86_64-unknown-linux-gnu -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_60 -nogpuinc \
 // RUN:   -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
 // RUN: | FileCheck -check-prefix=OPENMP %s
+// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
 
 // OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]"
 // OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"

bader · 2025-03-18T19:02:26Z

Like SPIRV is doing nvidia_gpu_sm_89 instead of just -Xarch_nvptx64 --offload-arch=sm_89.

Like SYCL?

bader

@jhuber6, thank you for helping with the common offload infrastructure!

It seems that if I want to target NVIDIA RTX 4080, I have to provide at least four flags:

offloading mode (e.g. -fopenmp)
offloading target (e.g. -fopenmp-targets=nvptx64)
offloading architecture using technically two flags: -Xarch_ and --offload-arch= (e.g. -Xarch_nvptx64 --offload-arch=sm_89)

As a user, I wish to have simpler command line interface when I don't need to configure device toolchain - just to specify exact device to tune for. At the same time, I agree that we need this interface for configuring device toolchains.

Tagging @mdtoguchi, @Naghasan for awareness.

jhuber6 · 2025-03-18T21:57:17Z

Thanks, @Artem-B had the initial hangups, so I'll defer to him for the final +1. I'd prefer this solution to continuously prefixing things in offload-arch however.

As a user, I wish to have simpler command line interface when I don't need to configure device toolchain - just to specify exact device to tune for. At the same time, I agree that we need this interface for configuring device toolchains.

Yeah, I think things necessarily start getting complicated when you combine many different architectures into one clang job. We could theoretically just keep putting things in --offload-arch but soon the complexity gets pretty similar.

llvm-ci · 2025-03-21T13:32:19Z

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building clang at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/18117

Here is the relevant piece of the build log for the reference

Step 7 (Add check check-offload) failure: test (failure)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/gpupgo/pgo2.c' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a -fprofile-generate
# note: command had no output on stdout or stderr
# RUN: at line 2
env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw      /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp 2>&1
# executed command: env LLVM_PROFILE_FILE=pgo2.c.llvm.profraw /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/gpupgo/Output/pgo2.c.tmp
# note: command had no output on stdout or stderr
# RUN: at line 4
llvm-profdata show --all-functions --counts      pgo2.c.llvm.profraw | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c      --check-prefix="LLVM-HOST"
# executed command: llvm-profdata show --all-functions --counts pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-HOST
# note: command had no output on stdout or stderr
# RUN: at line 7
llvm-profdata show --all-functions --counts      amdgcn-amd-amdhsa.pgo2.c.llvm.profraw      | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix="LLVM-DEVICE"
# executed command: llvm-profdata show --all-functions --counts amdgcn-amd-amdhsa.pgo2.c.llvm.profraw
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c --check-prefix=LLVM-DEVICE
# .---command stderr------------
# | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c:81:17: error: LLVM-DEVICE: expected string not found in input
# | // LLVM-DEVICE: Block counts: [10, 2, 1]
# |                 ^
# | <stdin>:4:13: note: scanning from here
# |  Counters: 3
# |             ^
# | <stdin>:5:2: note: possible intended match here
# |  Block counts: [10, 3, 1]
# |  ^
# | 
# | Input file: <stdin>
# | Check file: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/gpupgo/pgo2.c
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Counters: 
# |             2:  __omp_offloading_802_b3a8121_main_l61: 
# |             3:  Hash: 0x07735b6a1ad4d6e5 
# |             4:  Counters: 3 
# | check:81'0                 X error: no match found
# |             5:  Block counts: [10, 3, 1] 
# | check:81'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~
# | check:81'1      ?                         possible intended match
...

jhuber6 requested review from AlexVlx, Artem-B, arsenm, bader, sarnex and yxsamliu March 18, 2025 18:58

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Mar 18, 2025

bader approved these changes Mar 18, 2025

View reviewed changes

yxsamliu approved these changes Mar 19, 2025

View reviewed changes

sarnex approved these changes Mar 19, 2025

View reviewed changes

jhuber6 merged commit 561dcb2 into llvm:main Mar 21, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang] Permit `-Xarch_` to be used with `--offload-arch` #131884

[Clang] Permit `-Xarch_` to be used with `--offload-arch` #131884

Uh oh!

jhuber6 commented Mar 18, 2025 •

edited by JonChesterfield

Loading

Uh oh!

llvmbot commented Mar 18, 2025

Uh oh!

llvmbot commented Mar 18, 2025

Uh oh!

bader commented Mar 18, 2025

Uh oh!

bader left a comment

Uh oh!

jhuber6 commented Mar 18, 2025

Uh oh!

Uh oh!

llvm-ci commented Mar 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Clang] Permit -Xarch_ to be used with --offload-arch #131884

[Clang] Permit -Xarch_ to be used with --offload-arch #131884

Uh oh!

Conversation

jhuber6 commented Mar 18, 2025 • edited by JonChesterfield Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 18, 2025

Uh oh!

llvmbot commented Mar 18, 2025

Uh oh!

bader commented Mar 18, 2025

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Mar 18, 2025

Uh oh!

Uh oh!

llvm-ci commented Mar 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Clang] Permit `-Xarch_` to be used with `--offload-arch` #131884

[Clang] Permit `-Xarch_` to be used with `--offload-arch` #131884

jhuber6 commented Mar 18, 2025 •

edited by JonChesterfield

Loading