Skip to content

Conversation

@ivanradanov
Copy link
Contributor

When using clang-linker-wrapper with --cuda-path, it does not get passed on to the child device linking processes. This causes it to fail when cuda linking is involved and nvlink is not in $PATH. This patch lets the child linking process find nvlink through --cuda-path.

When using clang-linker-wrapper with --cuda-path, it does not get passed on to
the child device linking processes. This causes it to fail when cuda linking is
involved and nvlink is not in $PATH. This patch lets the child linking process
find nvlink through --cuda-path.
@ivanradanov ivanradanov requested a review from jhuber6 July 16, 2025 14:35
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jul 16, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 16, 2025

@llvm/pr-subscribers-clang

Author: Ivan R. Ivanov (ivanradanov)

Changes

When using clang-linker-wrapper with --cuda-path, it does not get passed on to the child device linking processes. This causes it to fail when cuda linking is involved and nvlink is not in $PATH. This patch lets the child linking process find nvlink through --cuda-path.


Full diff: https://github.com/llvm/llvm-project/pull/149107.diff

1 Files Affected:

  • (modified) clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp (+2)
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 9d34b62da20f5..b9e20a6534bf6 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -567,6 +567,8 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args,
     CmdArgs.append({"-Xlinker", Args.MakeArgString(Arg)});
   for (StringRef Arg : Args.getAllArgValues(OPT_compiler_arg_EQ))
     CmdArgs.push_back(Args.MakeArgString(Arg));
+  for (StringRef Arg : Args.getAllArgValues(OPT_cuda_path_EQ))
+    CmdArgs.push_back(Args.MakeArgString("--cuda-path=" + Arg));
 
   if (Error Err = executeCommands(*ClangPath, CmdArgs))
     return std::move(Err);

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be handled by the forwarding interface in the Clang toolchain. Doing something like this you should see it.

$ clang input.c -fopenmp --offload-arch=sm_89 --cuda-path=/opt/cuda -###
"clang-linker-wrapper"  "--device-compiler=nvptx64-nvidia-cuda=--cuda-path=/opt/cuda"

Are you not getting this? That's probably a bug so it'd help to have some examples with how you're invoking this.

@ivanradanov
Copy link
Contributor Author

I see. I was doing this

clang -foffload-via-llvm --cuda-path=/usr/local/cuda input.o  -o a.out

which only passes --cuda-path to clang-linker-wrapper directly:

"clang-linker-wrapper" ... "--cuda-path=/usr/local/cuda" ...

I suppose under the current infra, all the offloading compilation arguments need to be present when linking as well, so as to invoke the appropriate toolchains, and the above usage is not recommended?

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 16, 2025

It's passed to the linker wrapper because it needs fatbinary for CUDA, but it should also be forwarded to the embedded clang job. If you compile with -v do you not see it on the clang --target=nvptx64-nvidia-cuda job after the linker wrapper?

@ivanradanov
Copy link
Contributor Author

clang --verbose -foffload-via-llvm --cuda-path=/usr/local/cuda input.o  -o a.out

Gives me

"clang-linker-wrapper" "--host-triple=x86_64-unknown-linux-gnu" "--cuda-path=/usr/local/cuda" "--linker-path=/usr/bin/ld" "-z" .....
"clang" --no-default-config -o /tmp/a.out.nvptx64.sm_89-954e0b.img --target=nvptx64-nvidia-cuda -march=sm_89 /tmp/input-nvptx64-nvidia-cuda-sm_89-945b9a.o

so the child of clang-linker-wrapper does not get the --cuda-path.

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 16, 2025

clang --verbose -foffload-via-llvm --cuda-path=/usr/local/cuda input.o  -o a.out

Gives me

"clang-linker-wrapper" "--host-triple=x86_64-unknown-linux-gnu" "--cuda-path=/usr/local/cuda" "--linker-path=/usr/bin/ld" "-z" .....
"clang" --no-default-config -o /tmp/a.out.nvptx64.sm_89-954e0b.img --target=nvptx64-nvidia-cuda -march=sm_89 /tmp/input-nvptx64-nvidia-cuda-sm_89-945b9a.o

so the child of clang-linker-wrapper does not get the --cuda-path.

That's odd, can you give me the full output of clang -v here?

@ivanradanov
Copy link
Contributor Author

ivanradanov commented Jul 16, 2025

(ins)$ /opt/llvm/install/release/bin/clang --verbose -foffload-via-llvm --cuda-path=/usr/local/cuda input.o  -o a.out  |& head -n 50
clang version 21.0.0git (https://github.com/llvm/llvm-project.git 6ac286cd491b419dd18a6e8de3aaef4caa44e093)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm/install/release/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda, version 12.8
 "/opt/llvm/install/release/bin/clang-linker-wrapper" --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --cuda-path=/usr/local/cuda --linker-path=/usr/bin/ld -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o -L/opt/llvm/install/release/bin/../lib/x86_64-unknown-linux-gnu -L/opt/llvm/install/release/lib/clang/21/lib/x86_64-unknown-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/12 -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -L/lib -L/usr/lib -L/usr/local/cuda-12.8/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/linux/ -L/usr/local/cuda-12.8/lib/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/linux/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/ -L. input.o -lomptarget -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o /lib/x86_64-linux-gnu/crtn.o -verbose
 "/opt/llvm/install/release/bin/clang" --no-default-config -o /tmp/a.out.nvptx64.sm_89-9e8377.img --target=nvptx64-nvidia-cuda -march=sm_89 /tmp/input-nvptx64-nvidia-cuda-sm_89-58597a.o
 "/usr/bin/ld" -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /tmp/a.out.openmp.image.wrapper-e1b944.o /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o -L /opt/llvm/install/release/bin/../lib/x86_64-unknown-linux-gnu -L /opt/llvm/install/release/lib/clang/21/lib/x86_64-unknown-linux-gnu -L /usr/lib/gcc/x86_64-linux-gnu/12 -L /usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64 -L /lib/x86_64-linux-gnu -L /lib/../lib64 -L /usr/lib/x86_64-linux-gnu -L /usr/lib64 -L /lib -L /usr/lib -L /usr/local/cuda-12.8/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/linux/ -L /usr/local/cuda-12.8/lib/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/linux/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/ -L . input.o -l omptarget -l gcc --as-needed -l gcc_s --no-as-needed -l c -l gcc --as-needed -l gcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o /lib/x86_64-linux-gnu/crtn.o -verbose
GNU ld (GNU Binutils for Ubuntu) 2.38
<rest is ld output>

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 16, 2025

(ins)$ /opt/llvm/install/release/bin/clang --verbose -foffload-via-llvm --cuda-path=/usr/local/cuda input.o  -o a.out  |& head -n 50
clang version 21.0.0git (https://github.com/llvm/llvm-project.git 6ac286cd491b419dd18a6e8de3aaef4caa44e093)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm/install/release/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda, version 12.8
 "/opt/llvm/install/release/bin/clang-linker-wrapper" --host-triple=x86_64-unknown-linux-gnu --wrapper-verbose --cuda-path=/usr/local/cuda --linker-path=/usr/bin/ld -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o -L/opt/llvm/install/release/bin/../lib/x86_64-unknown-linux-gnu -L/opt/llvm/install/release/lib/clang/21/lib/x86_64-unknown-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/12 -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -L/lib -L/usr/lib -L/usr/local/cuda-12.8/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L/usr/local/cuda-12.8/lib/linux/ -L/usr/local/cuda-12.8/lib/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/linux/ -L/opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/ -L. input.o -lomptarget -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o /lib/x86_64-linux-gnu/crtn.o -verbose
 "/opt/llvm/install/release/bin/clang" --no-default-config -o /tmp/a.out.nvptx64.sm_89-9e8377.img --target=nvptx64-nvidia-cuda -march=sm_89 /tmp/input-nvptx64-nvidia-cuda-sm_89-58597a.o
 "/usr/bin/ld" -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o a.out /tmp/a.out.openmp.image.wrapper-e1b944.o /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o -L /opt/llvm/install/release/bin/../lib/x86_64-unknown-linux-gnu -L /opt/llvm/install/release/lib/clang/21/lib/x86_64-unknown-linux-gnu -L /usr/lib/gcc/x86_64-linux-gnu/12 -L /usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64 -L /lib/x86_64-linux-gnu -L /lib/../lib64 -L /usr/lib/x86_64-linux-gnu -L /usr/lib64 -L /lib -L /usr/lib -L /usr/local/cuda-12.8/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L /usr/local/cuda-12.8/lib/linux/ -L /usr/local/cuda-12.8/lib/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/21/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/20/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/19/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/clang/18/lib/x86_64-unknown-linux-gnu/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/linux/ -L /opt/prebuilt/clang/llvm-17.0.6/Linux_x86_64/lib/ -L . input.o -l omptarget -l gcc --as-needed -l gcc_s --no-as-needed -l c -l gcc --as-needed -l gcc_s --no-as-needed /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o /lib/x86_64-linux-gnu/crtn.o -verbose
GNU ld (GNU Binutils for Ubuntu) 2.38
<rest is ld output>

I see, this is because there's no CUDA toolchain that's created in this case. Normally the .cu file will create an offloading toolchain, but we don't get that here because it doesn't recognize the .o as CUDA and thus never creates the toolchain. Hm, I wonder if there's a real saitsfactory solution to that.

@ivanradanov
Copy link
Contributor Author

ivanradanov commented Jul 17, 2025

passing -fopenmp --offload-arch=sm_80, like so

clang -fopenmp --offload-arch=sm_80 --verbose -foffload-via-llvm --cuda-path=/usr/local/cuda input.o  -o a.out

gives us the appropriate flags. That means the cuda toolchain was created, correct?

I wonder if we need a step in clang that looks at all the .o files for sections that need device linking and concats the archs, and reinvokes itself with --offload-arch=<all_collected_arches> (although it is clang-linker-wrapper's job to do the parsing of the .o files for that so kind of weird to have clang do it) But then in theory the appropriate toolchains should be created. Perhaps it can only kick in when -foffload-via-llvm is on, but no --offload-archs are specified, i.e. we are asking clang to figure the appropriate offload archs.

That step could actually be handled by clang-offload-wrapper - you would get

clang --foffload-via-llvm <args>
  -> clang-linker-wrapper --detect-archs-and-exec=clang <args>
    -> clang --foffload-via-llvm --offload-archs=<detected_archs> <args>
      -> clang-linker-wrapper (same as until now)

Pretty convoluted so I don't know if it's appropriate. Anyways, @jhuber6 thank you for taking a look, I think I will close this for now.

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 17, 2025

Yeah that's definitely not ideal, I'll need to think of a way to handle that more gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants