From 5971176c393773dca06f9b86c0ba303060f90ee4 Mon Sep 17 00:00:00 2001 From: srividya sundaram Date: Tue, 24 Sep 2024 09:30:54 -0700 Subject: [PATCH 1/5] [SYCL] Update SYCL --offload-arch option. --- sycl/doc/design/OffloadDesign.md | 147 +++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index cf3a43713b9cf..7d12e482d13b6 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -296,6 +296,125 @@ resemble `--gpu-tool-arg= `. This corresponds to the existing option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed set of targets. +#### --offload-arch + +For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using ``--offload-arch`` option. For instance + ``--offload-arch=sm_80`` to target an NVidia Tesla A100, + ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or + ``--offload-arch=sm_80,gfx90a`` to target both. +For Intel Graphics AOT target, valid values for ``--offload-arch`` are the +valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-device`` option. + +SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option. + +.. code-block:: console + Example: + $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU. + $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU. + +The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC. + +| Intel GPU device | ``--offload-arch`` accepted value | OCLOC -device value | +|------------------|-------------------------|------------------------| +| Intel(R) microarchitecture code name Broadwell Intel graphics architecture | bdw | bdw | +| Intel(R) microarchitecture code name Skylake Intel graphics architecture | skl | skl | +| Kaby Lake Intel graphics architecture | kbl | kbl | +| Coffee Lake Intel graphics architecture | cfl | cfl | +| Apollo Lake Intel graphics architecture | apl | apl | +| Broxton Intel graphics architecture | bxt | apl | +| Gemini Lake Intel graphics architecture | glk | glk | +| Whiskey Lake Intel graphics architecture | whl | whl | +| Amber Lake Intel graphics architecture | aml | aml | +| Comet Lake Intel graphics architecture | cml | cml | +| Ice Lake Intel graphics architecture | icl, icllp | icllp | +| Elkhart Lake Intel graphics architecture | ehl | ehl | +| Jasper Lake Intel graphics architecture | jsl | jsl | +| Tiger Lake Intel graphics architecture | tgl, tgllp | tgllp | +| Rocket Lake Intel graphics architecture | rkl | rkl | +| Alder Lake S Intel graphics architecture | adl_s | adl_s | +| Raptor Lake Intel graphics architecture | rpl_s | adl_s | +| Alder Lake P Intel graphics architecture | adl_p | adl_p | +| Alder Lake N Intel graphics architecture | adl_n | adl_n | +| DG1 Intel graphics architecture | dg1 | dg1 | +| Alchemist G10 Intel graphics architecture | acm_g10, dg2_g10 | acm_g10 | +| Alchemist G11 Intel graphics architecture | acm_g11, dg2_g11 | acm_g11 | +| Alchemist G12 Intel graphics architecture | acm_g12, dg2_g12 | acm_g12 | +| Ponte Vecchio Intel graphics architecture | pvc | pvc | +| Ponte Vecchio VG Intel graphics architecture | pvc_vg | pvc_vg | +| Meteor Lake U/S or Arrow Lake U/S Intel graphics architecture | mtl_u, mtl_s, arl_u | mtl_s | +| Meteor Lake H Intel graphics architecture | mtl_h | mtl_h | +| Arrow Lake H Intel graphics architecture | arl_h | arl_h | +| Battlemage G21 Intel graphics architecture | bmg_g21 | bmg_g21 | +| Lunar Lake Intel graphics architecture | lnl_m | lnl_m | + +#### nvptx64-nvidia-cuda support +For SYCL offloading to NVidia GPUs using ``--offload-arch`` option, the following table +lists the accepted values. + +| NVidia GPU device name | ``--offload-arch`` accepted values for NVidia GPUs | +|------------------------|----------------------------------------------------| +| NVIDIA Maxwell architecture (compute capability 5.0) | sm_50 | +| NVIDIA Maxwell architecture (compute capability 5.2) | sm_52 | +| NVIDIA Maxwell architecture (compute capability 5.3) | sm_53 | +| NVIDIA Pascal architecture (compute capability 6.0) | sm_60 | +| NVIDIA Pascal architecture (compute capability 6.1) | sm_61 | +| NVIDIA Pascal architecture (compute capability 6.2) | sm_62 | +| NVIDIA Volta architecture (compute capability 7.0) | sm_70 | +| NVIDIA Volta architecture (compute capability 7.2) | sm_72 | +| NVIDIA Turing architecture (compute capability 7.5) | sm_75 | +| NVIDIA Ampere architecture (compute capability 8.0) | sm_80 | +| NVIDIA Ampere architecture (compute capability 8.6) | sm_86 | +| NVIDIA Jetson/Drive AGX Orin architecture | sm_87 | +| NVIDIA Ada Lovelace architecture | sm_89 | +| NVIDIA Hopper architecture | sm_90 | +| NVIDIA Hopper architecture (with wgmma and setmaxnreg instructions) | sm_90a | + +#### amdgcn-amd-amdhsa support + +For SYCL offloading to AMD GPUs using ``--offload-arch`` option, the following table +lists the accepted values. + +| AMD GPU device name | ``--offload-arch`` accepted values for AMD GPUs | +|------------------------|----------------------------------------------------| +| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx700 | +| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx701 | +| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx702 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx801 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx802 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx803 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx805 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx810 | +| AMD GCN GFX9 (Vega) architecture | gfx900 | +| AMD GCN GFX9 (Vega) architecture | gfx902 | +| AMD GCN GFX9 (Vega) architecture | gfx904 | +| AMD GCN GFX9 (Vega) architecture | gfx906 | +| AMD GCN GFX9 (Vega) architecture | gfx908 | +| AMD GCN GFX9 (Vega) architecture | gfx909 | +| AMD GCN GFX9 (Vega) architecture | gfx90a | +| AMD GCN GFX9 (Vega) architecture | gfx90c | +| AMD GCN GFX9 (Vega) architecture | gfx940 | +| AMD GCN GFX9 (Vega) architecture | gfx941 | +| AMD GCN GFX9 (Vega) architecture | gfx942 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1010 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1011 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1012 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1013 | +| AMD GCN GFX10.3 (RDNA 2) architecture | gfx1030 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1031 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1032 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1033 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1034 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1035 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1036 | +| GCN GFX11 (RDNA 3) architecture | gfx1100 | +| GCN GFX11 (RDNA 3) architecture | gfx1101 | +| GCN GFX11 (RDNA 3) architecture | gfx1102 | +| GCN GFX11 (RDNA 3) architecture | gfx1103 | +| GCN GFX11 (RDNA 3) architecture | gfx1150 | +| GCN GFX11 (RDNA 3) architecture | gfx1151 | +| GCN GFX12 (RDNA 4) architecture | gfx1200 | +| GCN GFX12 (RDNA 4) architecture | gfx1201 | + #### spir64_fpga support Compilation behaviors involving AOT for FPGA involve an additional call to @@ -355,6 +474,34 @@ Additional options passed by the user via the `-Xsycl-target-backend=spir64_x86_64 ` command will be processed by a new option to the wrapper, `--cpu-tool-arg=` +Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs +will also leverage the `--offload-arch` option. +The valid CPU device names accepted for `--offload-arch` are CPU names from ``clang -march``. +These names are more verbose, and do not overlap with the Intel GPU names. +These user input CPU names are mapped to the corresponding ``opencl-aot -march`` option. + +The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel CPUs and the corresponding `-march` value passed to opencl-aot. + +| Intel CPU device | ``--offload-arch`` accepted value | opencl-aot -march value | +|----------------|-------------------------|----------------------------| +| Intel(R) Advanced Vector Extensions 512 | skylake-avx512 | avx512 | +| Intel(R) Advanced Vector Extensions 2 | core-avx2 | avx2 | +| Intel(R) Advanced Vector Extensions | corei7-avx | avx | +| Intel(R) Streaming SIMD Extensions 4.2 | corei7 | sse4.2 | +| Intel(R) microarchitecture code name Westmere | westmere | wsm | +| Intel(R) microarchitecture code name Sandy Bridge | sandybridge | snb | +| Intel(R) microarchitecture code name Ivy Bridge | ivybridge | ivyb | +| Intel(R) microarchitecture code name Broadwell | broadwell | bdw | +| Intel(R) microarchitecture code name Coffee Lake | coffeelake | cfl | +| Intel(R) microarchitecture code name Alder Lake | alderlake | adl | +| Intel(R) microarchitecture code name Skylake (client) | skylake | skylake | +| Intel(R) microarchitecture code name Skylake (server) | skx | skx | +| Intel(R) microarchitecture code name Cascade Lake | cascadelake | clk | +| Intel(R) microarchitecture code name Ice Lake (client) | icelake-client | icl | +| Intel(R) microarchitecture code name Ice Lake (server) | icelake-server | icx | +| Intel(R) microarchitecture code name Sapphire Rapids | sapphirerapids | spr | +| Intel(R) microarchitecture code name Granite Rapids | graniterapids | gnr | + ### Wrapping of device image Once the device binary is pulled out of the fat binary, the binary must be From 555986be0daae78502fe68175b26ac2f0acc8ce2 Mon Sep 17 00:00:00 2001 From: srividya sundaram Date: Mon, 30 Sep 2024 10:12:51 -0700 Subject: [PATCH 2/5] Address review comments. --- sycl/doc/design/OffloadDesign.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 7d12e482d13b6..d021d7f946cd7 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -283,7 +283,7 @@ list to be passed along. *Example: spir64_gen enabling options* > --gpu-tool-arg="-device pvc -options extraopt_pvc" ---gpu-tool-arg="-device skl -options -extraopt_skl" +--gpu-tool-arg="-options -extraopt_skl" *Example: clang-linker-wrapper options* @@ -302,15 +302,15 @@ For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the ``--offload-arch=sm_80`` to target an NVidia Tesla A100, ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or ``--offload-arch=sm_80,gfx90a`` to target both. -For Intel Graphics AOT target, valid values for ``--offload-arch`` are the + +For Intel Graphics AOT target, valid values for ``--offload-arch`` are mapped to valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-device`` option. SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option. -.. code-block:: console - Example: - $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU. - $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU. +> Example: +> $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU. +> $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU. The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC. From eae60d2ce794e9012946fe1d236482252486c522 Mon Sep 17 00:00:00 2001 From: srividya sundaram Date: Mon, 30 Sep 2024 10:16:23 -0700 Subject: [PATCH 3/5] Fix formatting. --- sycl/doc/design/OffloadDesign.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index d021d7f946cd7..3c00cb881fdc0 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -308,9 +308,9 @@ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-dev SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option. -> Example: -> $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU. -> $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU. +`` Example:`` +`` $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.`` +`` $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.`` The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC. From b22651504269091a01eca5d53da0ce25ec286418 Mon Sep 17 00:00:00 2001 From: srividya sundaram Date: Mon, 30 Sep 2024 10:20:13 -0700 Subject: [PATCH 4/5] Add new line. --- sycl/doc/design/OffloadDesign.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index 3c00cb881fdc0..e3f9677cc17a5 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -308,9 +308,9 @@ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-dev SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option. -`` Example:`` -`` $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.`` -`` $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.`` +`` Example:`` +`` $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.`` +`` $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.`` The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC. From cb9f130a6b5ccff8e39bfc1f8a3adec17af0aea5 Mon Sep 17 00:00:00 2001 From: srividya sundaram Date: Mon, 30 Sep 2024 13:20:39 -0700 Subject: [PATCH 5/5] Add code block. --- sycl/doc/design/OffloadDesign.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index e3f9677cc17a5..8db5748217d0d 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -308,9 +308,12 @@ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-dev SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option. -`` Example:`` -`` $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.`` -`` $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.`` +``` +Example: + +$ clang++ -fsycl -offload-arch=bdw --offload-new-driver -c foo.cpp // SYCL AOT for Intel GPU. +$ clang++ -fsycl -offload-arch=broadwell --offload-new-driver -c foo.cpp // SYCL AOT for Intel CPU. +``` The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.