-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[Clang] Permit -Xarch_ to be used with --offload-arch
#131884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: The `--offload-arch` option is very complicated, but roughly behaves as the `-march` option for several compilations at once. This creates problems when we try to compbine multiple separate architectures into one, as happens with SYCL, OpenMP, and HIP w/ SPIR-V. The existing solution used by OpenMP is the `-Xopenmp-target` option, this lets you select which `--offload-arch` options go to which toolchain. This patch premits `-Xarch_` to be used in the same way. There are concerns about whether or not this falls into the `-Xarch_` umbrella because it changes the driver behavior, but I think this is the easiest way to handle this problem. The existing solutions seems to be prefixing things and adding more magic handling into `--offload-arch`. Like SPIRV is doing `nvidia_gpu_sm_89` instead of just `-Xarch_nvptx64 --offload-arch=sm_89`. The only reason this is more complicated than just doing `-Xarch_sm_89 -march=...` is because we need to know to create multiple jobs for each architecture.
|
@llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: The existing solution used by OpenMP is the There are concerns about whether or not this falls into the The only reason this is more complicated than just doing Full diff: https://github.com/llvm/llvm-project/pull/131884.diff 2 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 66ae8f1c7f064..05fc6aaa266b5 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1129,13 +1129,12 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">,
// Common offloading options
let Group = offload_Group in {
def offload_arch_EQ : Joined<["--"], "offload-arch=">,
- Visibility<[ClangOption, FlangOption]>, Flags<[NoXarchOption]>,
+ Visibility<[ClangOption, FlangOption]>,
HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed architectures. "
"For HIP offloading, the device architecture can be followed by target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">;
def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
- Flags<[NoXarchOption]>,
Visibility<[ClangOption, FlangOption]>,
HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/test/Driver/offload-Xarch.c b/clang/test/Driver/offload-Xarch.c
index 8856dac198465..8106dcfcd1354 100644
--- a/clang/test/Driver/offload-Xarch.c
+++ b/clang/test/Driver/offload-Xarch.c
@@ -14,6 +14,10 @@
// RUN: --target=x86_64-unknown-linux-gnu -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_60 -nogpuinc \
// RUN: -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
// RUN: | FileCheck -check-prefix=OPENMP %s
+// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]"
// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
|
|
@llvm/pr-subscribers-clang-driver Author: Joseph Huber (jhuber6) ChangesSummary: The existing solution used by OpenMP is the There are concerns about whether or not this falls into the The only reason this is more complicated than just doing Full diff: https://github.com/llvm/llvm-project/pull/131884.diff 2 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 66ae8f1c7f064..05fc6aaa266b5 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1129,13 +1129,12 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">,
// Common offloading options
let Group = offload_Group in {
def offload_arch_EQ : Joined<["--"], "offload-arch=">,
- Visibility<[ClangOption, FlangOption]>, Flags<[NoXarchOption]>,
+ Visibility<[ClangOption, FlangOption]>,
HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed architectures. "
"For HIP offloading, the device architecture can be followed by target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">;
def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
- Flags<[NoXarchOption]>,
Visibility<[ClangOption, FlangOption]>,
HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/test/Driver/offload-Xarch.c b/clang/test/Driver/offload-Xarch.c
index 8856dac198465..8106dcfcd1354 100644
--- a/clang/test/Driver/offload-Xarch.c
+++ b/clang/test/Driver/offload-Xarch.c
@@ -14,6 +14,10 @@
// RUN: --target=x86_64-unknown-linux-gnu -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_60 -nogpuinc \
// RUN: -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
// RUN: | FileCheck -check-prefix=OPENMP %s
+// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]"
// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
|
Like SYCL? |
bader
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhuber6, thank you for helping with the common offload infrastructure!
It seems that if I want to target NVIDIA RTX 4080, I have to provide at least four flags:
- offloading mode (e.g. -fopenmp)
- offloading target (e.g. -fopenmp-targets=nvptx64)
- offloading architecture using technically two flags: -Xarch_ and --offload-arch= (e.g. -Xarch_nvptx64 --offload-arch=sm_89)
As a user, I wish to have simpler command line interface when I don't need to configure device toolchain - just to specify exact device to tune for. At the same time, I agree that we need this interface for configuring device toolchains.
Tagging @mdtoguchi, @Naghasan for awareness.
|
Thanks, @Artem-B had the initial hangups, so I'll defer to him for the final +1. I'd prefer this solution to continuously prefixing things in
Yeah, I think things necessarily start getting complicated when you combine many different architectures into one clang job. We could theoretically just keep putting things in |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/18117 Here is the relevant piece of the build log for the reference |
Summary:
The
--offload-archoption is very complicated, but roughly behaves asthe
-marchoption for several compilations at once. This createsproblems when we try to combine multiple separate architectures into
one, as happens with SYCL, OpenMP, and HIP w/ SPIR-V.
The existing solution used by OpenMP is the
-Xopenmp-targetoption,this lets you select which
--offload-archoptions go to whichtoolchain. This patch permits
-Xarch_to be used in the same way.There are concerns about whether or not this falls into the
-Xarch_umbrella because it changes the driver behaviour, but I think this is the
easiest way to handle this problem. The existing solution seems to be
prefixing things and adding more magic handling into
--offload-arch.Like SYCL is doing
nvidia_gpu_sm_89instead of just-Xarch_nvptx64 --offload-arch=sm_89.The only reason this is more complicated than just doing
-Xarch_sm_89 -march=...is because we need to know to create multiple jobs for eacharchitecture.