@@ -283,7 +283,7 @@ list to be passed along.
283283* Example: spir64_gen enabling options*
284284
285285> --gpu-tool-arg="-device pvc -options extraopt_pvc"
286- --gpu-tool-arg="-device skl - options -extraopt_skl"
286+ --gpu-tool-arg="-options -extraopt_skl"
287287
288288* Example: clang-linker-wrapper options*
289289
@@ -296,6 +296,128 @@ resemble `--gpu-tool-arg=<arch> <arg>`. This corresponds to the existing
296296option syntax of ` -fsycl-targets=intel_gpu_arch ` where ` arch ` can be a fixed
297297set of targets.
298298
299+ #### --offload-arch
300+
301+ For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using `` --offload-arch `` option. For instance
302+ `` --offload-arch=sm_80 `` to target an NVidia Tesla A100,
303+ `` --offload-arch=gfx90a `` to target an AMD Instinct MI250X, or
304+ `` --offload-arch=sm_80,gfx90a `` to target both.
305+
306+ For Intel Graphics AOT target, valid values for `` --offload-arch `` are mapped to
307+ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the `` -device `` option.
308+
309+ SYCL offloading with `` --offload-arch `` for Intel CPUs and Intel GPUs is currently enabled only with `` --offload-new-driver `` option.
310+
311+ ```
312+ Example:
313+
314+ $ clang++ -fsycl -offload-arch=bdw --offload-new-driver -c foo.cpp // SYCL AOT for Intel GPU.
315+ $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver -c foo.cpp // SYCL AOT for Intel CPU.
316+ ```
317+
318+ The following table shows a mapping of the accepted values for ` --offload-arch ` to enable SYCL offloading to Intel GPUs and the corresponding ` -device ` value passed to OCLOC.
319+
320+ | Intel GPU device | `` --offload-arch `` accepted value | OCLOC -device value |
321+ | ------------------| -------------------------| ------------------------|
322+ | Intel(R) microarchitecture code name Broadwell Intel graphics architecture | bdw | bdw |
323+ | Intel(R) microarchitecture code name Skylake Intel graphics architecture | skl | skl |
324+ | Kaby Lake Intel graphics architecture | kbl | kbl |
325+ | Coffee Lake Intel graphics architecture | cfl | cfl |
326+ | Apollo Lake Intel graphics architecture | apl | apl |
327+ | Broxton Intel graphics architecture | bxt | apl |
328+ | Gemini Lake Intel graphics architecture | glk | glk |
329+ | Whiskey Lake Intel graphics architecture | whl | whl |
330+ | Amber Lake Intel graphics architecture | aml | aml |
331+ | Comet Lake Intel graphics architecture | cml | cml |
332+ | Ice Lake Intel graphics architecture | icl, icllp | icllp |
333+ | Elkhart Lake Intel graphics architecture | ehl | ehl |
334+ | Jasper Lake Intel graphics architecture | jsl | jsl |
335+ | Tiger Lake Intel graphics architecture | tgl, tgllp | tgllp |
336+ | Rocket Lake Intel graphics architecture | rkl | rkl |
337+ | Alder Lake S Intel graphics architecture | adl_s | adl_s |
338+ | Raptor Lake Intel graphics architecture | rpl_s | adl_s |
339+ | Alder Lake P Intel graphics architecture | adl_p | adl_p |
340+ | Alder Lake N Intel graphics architecture | adl_n | adl_n |
341+ | DG1 Intel graphics architecture | dg1 | dg1 |
342+ | Alchemist G10 Intel graphics architecture | acm_g10, dg2_g10 | acm_g10 |
343+ | Alchemist G11 Intel graphics architecture | acm_g11, dg2_g11 | acm_g11 |
344+ | Alchemist G12 Intel graphics architecture | acm_g12, dg2_g12 | acm_g12 |
345+ | Ponte Vecchio Intel graphics architecture | pvc | pvc |
346+ | Ponte Vecchio VG Intel graphics architecture | pvc_vg | pvc_vg |
347+ | Meteor Lake U/S or Arrow Lake U/S Intel graphics architecture | mtl_u, mtl_s, arl_u | mtl_s |
348+ | Meteor Lake H Intel graphics architecture | mtl_h | mtl_h |
349+ | Arrow Lake H Intel graphics architecture | arl_h | arl_h |
350+ | Battlemage G21 Intel graphics architecture | bmg_g21 | bmg_g21 |
351+ | Lunar Lake Intel graphics architecture | lnl_m | lnl_m |
352+
353+ #### nvptx64-nvidia-cuda support
354+ For SYCL offloading to NVidia GPUs using `` --offload-arch `` option, the following table
355+ lists the accepted values.
356+
357+ | NVidia GPU device name | `` --offload-arch `` accepted values for NVidia GPUs |
358+ | ------------------------| ----------------------------------------------------|
359+ | NVIDIA Maxwell architecture (compute capability 5.0) | sm_50 |
360+ | NVIDIA Maxwell architecture (compute capability 5.2) | sm_52 |
361+ | NVIDIA Maxwell architecture (compute capability 5.3) | sm_53 |
362+ | NVIDIA Pascal architecture (compute capability 6.0) | sm_60 |
363+ | NVIDIA Pascal architecture (compute capability 6.1) | sm_61 |
364+ | NVIDIA Pascal architecture (compute capability 6.2) | sm_62 |
365+ | NVIDIA Volta architecture (compute capability 7.0) | sm_70 |
366+ | NVIDIA Volta architecture (compute capability 7.2) | sm_72 |
367+ | NVIDIA Turing architecture (compute capability 7.5) | sm_75 |
368+ | NVIDIA Ampere architecture (compute capability 8.0) | sm_80 |
369+ | NVIDIA Ampere architecture (compute capability 8.6) | sm_86 |
370+ | NVIDIA Jetson/Drive AGX Orin architecture | sm_87 |
371+ | NVIDIA Ada Lovelace architecture | sm_89 |
372+ | NVIDIA Hopper architecture | sm_90 |
373+ | NVIDIA Hopper architecture (with wgmma and setmaxnreg instructions) | sm_90a |
374+
375+ #### amdgcn-amd-amdhsa support
376+
377+ For SYCL offloading to AMD GPUs using `` --offload-arch `` option, the following table
378+ lists the accepted values.
379+
380+ | AMD GPU device name | `` --offload-arch `` accepted values for AMD GPUs |
381+ | ------------------------| ----------------------------------------------------|
382+ | AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx700 |
383+ | AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx701 |
384+ | AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx702 |
385+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx801 |
386+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx802 |
387+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx803 |
388+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx805 |
389+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx810 |
390+ | AMD GCN GFX9 (Vega) architecture | gfx900 |
391+ | AMD GCN GFX9 (Vega) architecture | gfx902 |
392+ | AMD GCN GFX9 (Vega) architecture | gfx904 |
393+ | AMD GCN GFX9 (Vega) architecture | gfx906 |
394+ | AMD GCN GFX9 (Vega) architecture | gfx908 |
395+ | AMD GCN GFX9 (Vega) architecture | gfx909 |
396+ | AMD GCN GFX9 (Vega) architecture | gfx90a |
397+ | AMD GCN GFX9 (Vega) architecture | gfx90c |
398+ | AMD GCN GFX9 (Vega) architecture | gfx940 |
399+ | AMD GCN GFX9 (Vega) architecture | gfx941 |
400+ | AMD GCN GFX9 (Vega) architecture | gfx942 |
401+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1010 |
402+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1011 |
403+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1012 |
404+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1013 |
405+ | AMD GCN GFX10.3 (RDNA 2) architecture | gfx1030 |
406+ | GCN GFX10.3 (RDNA 2) architecture | gfx1031 |
407+ | GCN GFX10.3 (RDNA 2) architecture | gfx1032 |
408+ | GCN GFX10.3 (RDNA 2) architecture | gfx1033 |
409+ | GCN GFX10.3 (RDNA 2) architecture | gfx1034 |
410+ | GCN GFX10.3 (RDNA 2) architecture | gfx1035 |
411+ | GCN GFX10.3 (RDNA 2) architecture | gfx1036 |
412+ | GCN GFX11 (RDNA 3) architecture | gfx1100 |
413+ | GCN GFX11 (RDNA 3) architecture | gfx1101 |
414+ | GCN GFX11 (RDNA 3) architecture | gfx1102 |
415+ | GCN GFX11 (RDNA 3) architecture | gfx1103 |
416+ | GCN GFX11 (RDNA 3) architecture | gfx1150 |
417+ | GCN GFX11 (RDNA 3) architecture | gfx1151 |
418+ | GCN GFX12 (RDNA 4) architecture | gfx1200 |
419+ | GCN GFX12 (RDNA 4) architecture | gfx1201 |
420+
299421#### spir64_fpga support
300422
301423Compilation behaviors involving AOT for FPGA involve an additional call to
@@ -355,6 +477,34 @@ Additional options passed by the user via the
355477` -Xsycl-target-backend=spir64_x86_64 <opts> ` command will be processed by a new
356478option to the wrapper, ` --cpu-tool-arg=<arg> `
357479
480+ Similar to SYCL offloading to Intel GPUs using ` --offload-arch ` , SYCL AOT for Intel CPUs
481+ will also leverage the ` --offload-arch ` option.
482+ The valid CPU device names accepted for ` --offload-arch ` are CPU names from `` clang -march `` .
483+ These names are more verbose, and do not overlap with the Intel GPU names.
484+ These user input CPU names are mapped to the corresponding `` opencl-aot -march `` option.
485+
486+ The following table shows a mapping of the accepted values for ` --offload-arch ` to enable SYCL offloading to Intel CPUs and the corresponding ` -march ` value passed to opencl-aot.
487+
488+ | Intel CPU device | `` --offload-arch `` accepted value | opencl-aot -march value |
489+ | ----------------| -------------------------| ----------------------------|
490+ | Intel(R) Advanced Vector Extensions 512 | skylake-avx512 | avx512 |
491+ | Intel(R) Advanced Vector Extensions 2 | core-avx2 | avx2 |
492+ | Intel(R) Advanced Vector Extensions | corei7-avx | avx |
493+ | Intel(R) Streaming SIMD Extensions 4.2 | corei7 | sse4.2 |
494+ | Intel(R) microarchitecture code name Westmere | westmere | wsm |
495+ | Intel(R) microarchitecture code name Sandy Bridge | sandybridge | snb |
496+ | Intel(R) microarchitecture code name Ivy Bridge | ivybridge | ivyb |
497+ | Intel(R) microarchitecture code name Broadwell | broadwell | bdw |
498+ | Intel(R) microarchitecture code name Coffee Lake | coffeelake | cfl |
499+ | Intel(R) microarchitecture code name Alder Lake | alderlake | adl |
500+ | Intel(R) microarchitecture code name Skylake (client) | skylake | skylake |
501+ | Intel(R) microarchitecture code name Skylake (server) | skx | skx |
502+ | Intel(R) microarchitecture code name Cascade Lake | cascadelake | clk |
503+ | Intel(R) microarchitecture code name Ice Lake (client) | icelake-client | icl |
504+ | Intel(R) microarchitecture code name Ice Lake (server) | icelake-server | icx |
505+ | Intel(R) microarchitecture code name Sapphire Rapids | sapphirerapids | spr |
506+ | Intel(R) microarchitecture code name Granite Rapids | graniterapids | gnr |
507+
358508### Wrapping of device image
359509
360510Once the device binary is pulled out of the fat binary, the binary must be
0 commit comments