From 039bc391abcc4fb3e4dfab4a295596d3871786b8 Mon Sep 17 00:00:00 2001 From: "Balaji V. Iyer" Date: Wed, 27 May 2026 11:32:54 -0700 Subject: [PATCH 1/5] Added documentation for compiling FAT binaries with multiple device-architectures. --- sycl/doc/design/OffloadDesign.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 6d64a53b4e4e6..a37d3f596d851 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -251,6 +251,21 @@ The `--device-compiler` option uses the format `--device-compiler=[:][` and `` are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the backend compiler. If `` is not specified, the arguments will match any offloading kind; if `` is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets. +To support multiple device architectures, a new `--device-compiler` option must be specified for each device. For example, to compile for Ponte Vecchio (PVC) and Skylake (SKL) architectures and put them in a FAT binary, the user must add the following two `--device-compiler` options: + +`--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options ...` + +`--device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options ...` + +Device specific optimizations for each of the device architectures should be specified after `-device `. + +Here is an example of a clang-linker-wrapper invocation where ther user wants to create a FAT binary with PVC and SKL architectures to be run on a x86_64 Linux host. In addition, they would like to enable aggressive mathematical optimizations and are tolerant for slightly imprecise floating-point values just for SKL, that is, use the `-cl-unsafe-math-optimizations`. For PVC, they would like to enable the multiply and add instruction usage (`-cl-mad-enable`). The source binaries are called host.o and kernel.o and the output should be called out.exe. + +`clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu + --device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options "cl-mad-enable" +--device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options "-cl-unsafe-math-optimizations" +host.o kernel.o -o out.exe` + #### Other Supported Options To complete the support needed for the various targets using the `clang-linker-wrapper` as the main interface, a few additional options will From 45cb8e2fab09eb1f126657043fb6f182cdc0e513 Mon Sep 17 00:00:00 2001 From: "Balaji V. Iyer" Date: Wed, 27 May 2026 11:47:21 -0700 Subject: [PATCH 2/5] Fixed a small typo --- sycl/doc/design/OffloadDesign.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index a37d3f596d851..704af648a3b55 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -259,11 +259,11 @@ To support multiple device architectures, a new `--device-compiler` option must Device specific optimizations for each of the device architectures should be specified after `-device `. -Here is an example of a clang-linker-wrapper invocation where ther user wants to create a FAT binary with PVC and SKL architectures to be run on a x86_64 Linux host. In addition, they would like to enable aggressive mathematical optimizations and are tolerant for slightly imprecise floating-point values just for SKL, that is, use the `-cl-unsafe-math-optimizations`. For PVC, they would like to enable the multiply and add instruction usage (`-cl-mad-enable`). The source binaries are called host.o and kernel.o and the output should be called out.exe. +Here is an example of a clang-linker-wrapper invocation where ther user wants to create a FAT binary with PVC and SKL architectures to be run on a x86_64 Linux host. In addition, they would like to enable aggressive mathematical optimizations and are tolerant for slightly imprecise floating-point values just for SKL, that is, use the `-cl-unsafe-math-optimizations` flag. For PVC, they would like to enable the multiply and add instruction usage (`-cl-mad-enable`). The source binaries are called host.o and kernel.o and the output should be called out.exe. `clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu - --device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options "cl-mad-enable" ---device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options "-cl-unsafe-math-optimizations" + --device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options "-cl-mad-enable" + --device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options "-cl-unsafe-math-optimizations" -- /usr/bin/ld host.o kernel.o -o out.exe` #### Other Supported Options From 803228c4cae9ff66e856627ba658d90116065184 Mon Sep 17 00:00:00 2001 From: "Balaji V. Iyer." <43187390+bviyer@users.noreply.github.com> Date: Fri, 29 May 2026 13:36:21 -0500 Subject: [PATCH 3/5] Update sycl/doc/design/OffloadDesign.md Co-authored-by: Nick Sarnie --- sycl/doc/design/OffloadDesign.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 704af648a3b55..05c715fc0292f 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -251,7 +251,7 @@ The `--device-compiler` option uses the format `--device-compiler=[:][` and `` are matched against the current compilation target. Only arguments that match both the offloading kind and target triple will be passed to the backend compiler. If `` is not specified, the arguments will match any offloading kind; if `` is not specified, the arguments will match any target triple; and if neither is specified, the arguments will be applied to all targets. -To support multiple device architectures, a new `--device-compiler` option must be specified for each device. For example, to compile for Ponte Vecchio (PVC) and Skylake (SKL) architectures and put them in a FAT binary, the user must add the following two `--device-compiler` options: +To support multiple device architectures, a new `--device-compiler` option must be specified for each device. For example, to compile for Ponte Vecchio (PVC) and Skylake (SKL) architectures and put them in a fat binary, the user must add the following two `--device-compiler` options: `--device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options ...` From 71bd20a531dc4f673dd4f6106d23daa7580a0d1c Mon Sep 17 00:00:00 2001 From: "Balaji V. Iyer." <43187390+bviyer@users.noreply.github.com> Date: Fri, 29 May 2026 13:36:36 -0500 Subject: [PATCH 4/5] Update sycl/doc/design/OffloadDesign.md Co-authored-by: Nick Sarnie --- sycl/doc/design/OffloadDesign.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 05c715fc0292f..7c37f769d5c17 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -257,7 +257,7 @@ To support multiple device architectures, a new `--device-compiler` option must `--device-compiler=sycl:spir64_gen-unknown-unknown=-device skl -options ...` -Device specific optimizations for each of the device architectures should be specified after `-device `. +Device specific options for each of the device architectures should be specified after `-device `. Here is an example of a clang-linker-wrapper invocation where ther user wants to create a FAT binary with PVC and SKL architectures to be run on a x86_64 Linux host. In addition, they would like to enable aggressive mathematical optimizations and are tolerant for slightly imprecise floating-point values just for SKL, that is, use the `-cl-unsafe-math-optimizations` flag. For PVC, they would like to enable the multiply and add instruction usage (`-cl-mad-enable`). The source binaries are called host.o and kernel.o and the output should be called out.exe. From 4ffbce1bcb795183fbff3564cd241c1b94cd9cdd Mon Sep 17 00:00:00 2001 From: "Balaji V. Iyer." <43187390+bviyer@users.noreply.github.com> Date: Fri, 29 May 2026 13:36:46 -0500 Subject: [PATCH 5/5] Update sycl/doc/design/OffloadDesign.md Co-authored-by: Nick Sarnie --- sycl/doc/design/OffloadDesign.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 7c37f769d5c17..35a86d2184ea2 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -259,7 +259,7 @@ To support multiple device architectures, a new `--device-compiler` option must Device specific options for each of the device architectures should be specified after `-device `. -Here is an example of a clang-linker-wrapper invocation where ther user wants to create a FAT binary with PVC and SKL architectures to be run on a x86_64 Linux host. In addition, they would like to enable aggressive mathematical optimizations and are tolerant for slightly imprecise floating-point values just for SKL, that is, use the `-cl-unsafe-math-optimizations` flag. For PVC, they would like to enable the multiply and add instruction usage (`-cl-mad-enable`). The source binaries are called host.o and kernel.o and the output should be called out.exe. +Here is an example of a clang-linker-wrapper invocation where ther user wants to create a fat binary with PVC and SKL architectures to be run on a x86_64 Linux host. In addition, they would like to enable aggressive mathematical optimizations and are tolerant for slightly imprecise floating-point values just for SKL, that is, use the `-cl-unsafe-math-optimizations` flag. For PVC, they would like to enable the multiply and add instruction usage (`-cl-mad-enable`). The source binaries are called host.o and kernel.o and the output should be called out.exe. `clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --device-compiler=sycl:spir64_gen-unknown-unknown=-device pvc -options "-cl-mad-enable"