@@ -34,10 +34,10 @@ described in the `clang documentation
3434by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
3535through the ``--offload-new-driver` `` and ``-fgpu-rdc `` flags.
3636
37- In order or link the GPU runtime, we simply pass this library to the embedded
38- device linker job. This can be done using the ``-Xoffload-linker `` option, which
39- forwards an argument to a ``clang `` job used to create the final GPU executable.
40- The toolchain should pick up the C libraries automatically in most cases, so
37+ In order or link the GPU runtime, we simply pass this library to the embedded
38+ device linker job. This can be done using the ``-Xoffload-linker `` option, which
39+ forwards an argument to a ``clang `` job used to create the final GPU executable.
40+ The toolchain should pick up the C libraries automatically in most cases, so
4141this shouldn't be necessary.
4242
4343.. code-block :: sh
@@ -189,7 +189,7 @@ final executable.
189189
190190 #include <stdio.h>
191191
192- int main() { fputs ("Hello from AMDGPU!\n ", stdout ); }
192+ int main() { printf ("Hello from AMDGPU!\n "); }
193193
194194This program can then be compiled using the ``clang `` compiler. Note that
195195``-flto `` and ``-mcpu= `` should be defined. This is because the GPU
@@ -227,28 +227,26 @@ Building for NVPTX targets
227227^^^^^^^^^^^^^^^^^^^^^^^^^^
228228
229229The infrastructure is the same as the AMDGPU example. However, the NVPTX binary
230- utilities are very limited and must be targeted directly. There is no linker
231- support for static libraries so we need to link in the `` libc.bc `` bitcode and
232- inform the compiler driver of the file's contents .
230+ utilities are very limited and must be targeted directly. A utility called
231+ `` clang-nvlink-wrapper `` instead wraps around the standard link job to give the
232+ illusion that `` nvlink `` is a functional linker .
233233
234234.. code-block :: c++
235235
236236 #include <stdio.h>
237237
238238 int main(int argc, char **argv, char **envp) {
239- fputs ("Hello from NVPTX!\n ", stdout );
239+ printf ("Hello from NVPTX!\n ");
240240 }
241241
242242Additionally, the NVPTX ABI requires that every function signature matches. This
243243requires us to pass the full prototype from ``main ``. The installation will
244244contain the ``nvptx-loader `` utility if the CUDA driver was found during
245- compilation.
245+ compilation. Using link time optimization will help hide this.
246246
247247.. code-block :: sh
248248
249- $> clang hello.c --target=nvptx64-nvidia-cuda -march=native \
250- -x ir < install> /lib/nvptx64-nvidia-cuda/libc.bc \
251- -x ir < install> /lib/nvptx64-nvidia-cuda/crt1.o
249+ $> clang hello.c --target=nvptx64-nvidia-cuda -mcpu=native -flto -lc < install> /lib/nvptx64-nvidia-cuda/crt1.o
252250 $> nvptx-loader --threads 2 --blocks 2 a.out
253251 Hello from NVPTX!
254252 Hello from NVPTX!
0 commit comments