From 5971176c393773dca06f9b86c0ba303060f90ee4 Mon Sep 17 00:00:00 2001
From: srividya sundaram <srividya.sundaram@intel.com>
Date: Tue, 24 Sep 2024 09:30:54 -0700
Subject: [PATCH 1/5] [SYCL] Update SYCL --offload-arch option.

---
 sycl/doc/design/OffloadDesign.md | 147 +++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md
index cf3a43713b9cf..7d12e482d13b6 100644
--- a/sycl/doc/design/OffloadDesign.md
+++ b/sycl/doc/design/OffloadDesign.md
@@ -296,6 +296,125 @@ resemble `--gpu-tool-arg=<arch> <arg>`.  This corresponds to the existing
 option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed
 set of targets.
 
+#### --offload-arch
+
+For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using ``--offload-arch`` option. For instance 
+  ``--offload-arch=sm_80`` to target an NVidia Tesla A100, 
+  ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or 
+  ``--offload-arch=sm_80,gfx90a`` to target both.
+For Intel Graphics AOT target, valid values for ``--offload-arch`` are the
+valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-device`` option.
+
+SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option.
+
+.. code-block:: console
+    Example:
+    $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.
+    $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.
+
+The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.
+
+| Intel GPU device | ``--offload-arch`` accepted value | OCLOC -device value |
+|------------------|-------------------------|------------------------|
+| Intel(R) microarchitecture code name Broadwell Intel graphics architecture | bdw | bdw |
+| Intel(R) microarchitecture code name Skylake Intel graphics architecture | skl | skl |
+| Kaby Lake Intel graphics architecture | kbl | kbl                   |
+| Coffee Lake Intel graphics architecture | cfl | cfl                 |
+| Apollo Lake Intel graphics architecture | apl | apl                 |
+| Broxton Intel graphics architecture | bxt  | apl                    |
+| Gemini Lake Intel graphics architecture | glk | glk                 |
+| Whiskey Lake Intel graphics architecture | whl | whl                |
+| Amber Lake Intel graphics architecture | aml | aml                  |
+| Comet Lake Intel graphics architecture | cml | cml                  |
+| Ice Lake Intel graphics architecture | icl, icllp | icllp           |
+| Elkhart Lake Intel graphics architecture | ehl | ehl                |
+| Jasper Lake Intel graphics architecture | jsl | jsl                 |
+| Tiger Lake Intel graphics architecture | tgl, tgllp | tgllp         |
+| Rocket Lake Intel graphics architecture | rkl | rkl                 |
+| Alder Lake S Intel graphics architecture | adl_s | adl_s            |
+| Raptor Lake Intel graphics architecture | rpl_s | adl_s             |
+| Alder Lake P Intel graphics architecture | adl_p | adl_p            |
+| Alder Lake N Intel graphics architecture | adl_n | adl_n            |
+| DG1 Intel graphics architecture | dg1      | dg1                    |
+| Alchemist G10 Intel graphics architecture | acm_g10, dg2_g10 | acm_g10 |
+| Alchemist G11 Intel graphics architecture | acm_g11, dg2_g11 | acm_g11 |
+| Alchemist G12 Intel graphics architecture | acm_g12, dg2_g12 | acm_g12 |
+| Ponte Vecchio Intel graphics architecture | pvc | pvc               |
+| Ponte Vecchio VG Intel graphics architecture | pvc_vg | pvc_vg      |
+| Meteor Lake U/S or Arrow Lake U/S Intel graphics architecture | mtl_u, mtl_s, arl_u | mtl_s |
+| Meteor Lake H Intel graphics architecture | mtl_h | mtl_h           |
+| Arrow Lake H Intel graphics architecture | arl_h | arl_h            |
+| Battlemage G21 Intel graphics architecture | bmg_g21 | bmg_g21      |
+| Lunar Lake Intel graphics architecture | lnl_m | lnl_m              |
+
+#### nvptx64-nvidia-cuda support
+For SYCL offloading to NVidia GPUs using ``--offload-arch`` option, the following table
+lists the accepted values.
+
+| NVidia GPU device name | ``--offload-arch`` accepted values for NVidia GPUs |
+|------------------------|----------------------------------------------------|
+| NVIDIA Maxwell architecture (compute capability 5.0) | sm_50 |
+| NVIDIA Maxwell architecture (compute capability 5.2) | sm_52 |
+| NVIDIA Maxwell architecture (compute capability 5.3) | sm_53 |
+| NVIDIA Pascal architecture (compute capability 6.0)  | sm_60 |
+| NVIDIA Pascal architecture (compute capability 6.1)  | sm_61 |
+| NVIDIA Pascal architecture (compute capability 6.2)  | sm_62 |
+| NVIDIA Volta architecture (compute capability 7.0)   | sm_70 |
+| NVIDIA Volta architecture (compute capability 7.2)   | sm_72 |
+| NVIDIA Turing architecture (compute capability 7.5)  | sm_75 |
+| NVIDIA Ampere architecture (compute capability 8.0)  | sm_80 |
+| NVIDIA Ampere architecture (compute capability 8.6)  | sm_86 |
+| NVIDIA Jetson/Drive AGX Orin architecture            | sm_87 |
+| NVIDIA Ada Lovelace architecture                     | sm_89 |
+| NVIDIA Hopper architecture                           | sm_90 |
+| NVIDIA Hopper architecture (with wgmma and setmaxnreg instructions)  | sm_90a |
+
+#### amdgcn-amd-amdhsa support
+
+For SYCL offloading to AMD GPUs using ``--offload-arch`` option, the following table
+lists the accepted values.
+
+| AMD GPU device name | ``--offload-arch`` accepted values for AMD GPUs |
+|------------------------|----------------------------------------------------|
+| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx700 |
+| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx701 |
+| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx702 |
+| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx801 |
+| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx802 |
+| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx803 |
+| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx805 |
+| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx810 |
+| AMD GCN GFX9 (Vega) architecture | gfx900 |
+| AMD GCN GFX9 (Vega) architecture | gfx902 |
+| AMD GCN GFX9 (Vega) architecture | gfx904 |
+| AMD GCN GFX9 (Vega) architecture | gfx906 |
+| AMD GCN GFX9 (Vega) architecture | gfx908 |
+| AMD GCN GFX9 (Vega) architecture | gfx909 |
+| AMD GCN GFX9 (Vega) architecture | gfx90a |
+| AMD GCN GFX9 (Vega) architecture | gfx90c |
+| AMD GCN GFX9 (Vega) architecture | gfx940 |
+| AMD GCN GFX9 (Vega) architecture | gfx941 |
+| AMD GCN GFX9 (Vega) architecture | gfx942 |
+| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1010 |
+| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1011 |
+| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1012 |
+| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1013 |
+| AMD GCN GFX10.3 (RDNA 2) architecture | gfx1030 |
+| GCN GFX10.3 (RDNA 2) architecture | gfx1031 |
+| GCN GFX10.3 (RDNA 2) architecture | gfx1032 |
+| GCN GFX10.3 (RDNA 2) architecture | gfx1033 |
+| GCN GFX10.3 (RDNA 2) architecture | gfx1034 |
+| GCN GFX10.3 (RDNA 2) architecture | gfx1035 |
+| GCN GFX10.3 (RDNA 2) architecture | gfx1036 |
+| GCN GFX11 (RDNA 3) architecture | gfx1100 |
+| GCN GFX11 (RDNA 3) architecture | gfx1101 |
+| GCN GFX11 (RDNA 3) architecture | gfx1102 |
+| GCN GFX11 (RDNA 3) architecture | gfx1103 |
+| GCN GFX11 (RDNA 3) architecture | gfx1150 |
+| GCN GFX11 (RDNA 3) architecture | gfx1151 |
+| GCN GFX12 (RDNA 4) architecture | gfx1200 |
+| GCN GFX12 (RDNA 4) architecture | gfx1201 |
+
 #### spir64_fpga support
 
 Compilation behaviors involving AOT for FPGA involve an additional call to
@@ -355,6 +474,34 @@ Additional options passed by the user via the
 `-Xsycl-target-backend=spir64_x86_64 <opts>` command will be processed by a new
 option to the wrapper, `--cpu-tool-arg=<arg>`
 
+Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs
+will also leverage the `--offload-arch` option.
+The valid CPU device names accepted for `--offload-arch` are CPU names from ``clang -march``.
+These names are more verbose, and do not overlap with the Intel GPU names.
+These user input CPU names are mapped to the corresponding ``opencl-aot -march`` option.
+
+The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel CPUs and the corresponding `-march` value passed to opencl-aot.
+
+| Intel CPU device | ``--offload-arch`` accepted value | opencl-aot -march value |
+|----------------|-------------------------|----------------------------|
+| Intel(R) Advanced Vector Extensions 512 | skylake-avx512 | avx512     |
+| Intel(R) Advanced Vector Extensions 2 | core-avx2 | avx2              |
+| Intel(R) Advanced Vector Extensions | corei7-avx | avx                |
+| Intel(R) Streaming SIMD Extensions 4.2 | corei7 | sse4.2              |
+| Intel(R) microarchitecture code name Westmere | westmere | wsm        |
+| Intel(R) microarchitecture code name Sandy Bridge | sandybridge | snb |
+| Intel(R) microarchitecture code name Ivy Bridge | ivybridge | ivyb    |
+| Intel(R) microarchitecture code name Broadwell | broadwell | bdw      |
+| Intel(R) microarchitecture code name Coffee Lake | coffeelake | cfl   |
+| Intel(R) microarchitecture code name Alder Lake | alderlake | adl     |
+| Intel(R) microarchitecture code name Skylake (client) | skylake | skylake     |
+| Intel(R) microarchitecture code name Skylake (server) | skx | skx     |
+| Intel(R) microarchitecture code name Cascade Lake | cascadelake | clk |
+| Intel(R) microarchitecture code name Ice Lake (client) | icelake-client | icl |
+| Intel(R) microarchitecture code name Ice Lake (server) | icelake-server | icx |
+| Intel(R) microarchitecture code name Sapphire Rapids | sapphirerapids | spr |
+| Intel(R) microarchitecture code name Granite Rapids | graniterapids | gnr |
+
 ### Wrapping of device image
 
 Once the device binary is pulled out of the fat binary, the binary must be

From 555986be0daae78502fe68175b26ac2f0acc8ce2 Mon Sep 17 00:00:00 2001
From: srividya sundaram <srividya.sundaram@intel.com>
Date: Mon, 30 Sep 2024 10:12:51 -0700
Subject: [PATCH 2/5] Address review comments.

---
 sycl/doc/design/OffloadDesign.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md
index 7d12e482d13b6..d021d7f946cd7 100644
--- a/sycl/doc/design/OffloadDesign.md
+++ b/sycl/doc/design/OffloadDesign.md
@@ -283,7 +283,7 @@ list to be passed along.
 *Example: spir64_gen enabling options*
 
 > --gpu-tool-arg="-device pvc -options extraopt_pvc"
---gpu-tool-arg="-device skl -options -extraopt_skl"
+--gpu-tool-arg="-options -extraopt_skl"
 
 *Example: clang-linker-wrapper options*
 
@@ -302,15 +302,15 @@ For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the
   ``--offload-arch=sm_80`` to target an NVidia Tesla A100, 
   ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or 
   ``--offload-arch=sm_80,gfx90a`` to target both.
-For Intel Graphics AOT target, valid values for ``--offload-arch`` are the
+
+For Intel Graphics AOT target, valid values for ``--offload-arch`` are mapped to
 valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-device`` option.
 
 SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option.
 
-.. code-block:: console
-    Example:
-    $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.
-    $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.
+>    Example:
+>    $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.
+>    $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.
 
 The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.
 

From eae60d2ce794e9012946fe1d236482252486c522 Mon Sep 17 00:00:00 2001
From: srividya sundaram <srividya.sundaram@intel.com>
Date: Mon, 30 Sep 2024 10:16:23 -0700
Subject: [PATCH 3/5] Fix formatting.

---
 sycl/doc/design/OffloadDesign.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md
index d021d7f946cd7..3c00cb881fdc0 100644
--- a/sycl/doc/design/OffloadDesign.md
+++ b/sycl/doc/design/OffloadDesign.md
@@ -308,9 +308,9 @@ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-dev
 
 SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option.
 
->    Example:
->    $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.
->    $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.
+``  Example:``
+``    $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.``
+``    $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.``
 
 The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.
 

From b22651504269091a01eca5d53da0ce25ec286418 Mon Sep 17 00:00:00 2001
From: srividya sundaram <srividya.sundaram@intel.com>
Date: Mon, 30 Sep 2024 10:20:13 -0700
Subject: [PATCH 4/5] Add new line.

---
 sycl/doc/design/OffloadDesign.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md
index 3c00cb881fdc0..e3f9677cc17a5 100644
--- a/sycl/doc/design/OffloadDesign.md
+++ b/sycl/doc/design/OffloadDesign.md
@@ -308,9 +308,9 @@ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-dev
 
 SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option.
 
-``  Example:``
-``    $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.``
-``    $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.``
+``  Example:``    
+``  $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.``     
+``  $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.``   
 
 The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.
 

From cb9f130a6b5ccff8e39bfc1f8a3adec17af0aea5 Mon Sep 17 00:00:00 2001
From: srividya sundaram <srividya.sundaram@intel.com>
Date: Mon, 30 Sep 2024 13:20:39 -0700
Subject: [PATCH 5/5] Add code block.

---
 sycl/doc/design/OffloadDesign.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md
index e3f9677cc17a5..8db5748217d0d 100644
--- a/sycl/doc/design/OffloadDesign.md
+++ b/sycl/doc/design/OffloadDesign.md
@@ -308,9 +308,12 @@ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-dev
 
 SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option.
 
-``  Example:``    
-``  $ clang++ -fsycl -offload-arch=bdw --offload-new-driver foo.cpp -c // SYCL AOT for Intel GPU.``     
-``  $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver foo.cpp -c // SYCL AOT for Intel CPU.``   
+```
+Example:
+
+$ clang++ -fsycl -offload-arch=bdw --offload-new-driver -c foo.cpp // SYCL AOT for Intel GPU.
+$ clang++ -fsycl -offload-arch=broadwell --offload-new-driver -c foo.cpp // SYCL AOT for Intel CPU.
+```
 
 The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.