2626
2727### Llama.cpp + SYCL
2828
29- The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it could support other vendor GPUs: Nvidia GPU ( * AMD GPU coming * ) .
29+ The llama.cpp SYCL backend is designed to support ** Intel GPU** firstly. Based on the cross-platform feature of SYCL, it also supports other vendor GPUs: Nvidia and AMD.
3030
3131## Recommended Release
3232
@@ -111,10 +111,18 @@ SYCL backend supports Intel GPU Family:
111111
112112** Verified devices**
113113
114- | Nvidia GPU | Status | Verified Model |
115- | --------------------------| ---------| ----------------|
116- | Ampere Series | Support | A100, A4000 |
117- | Ampere Series * (Mobile)* | Support | RTX 40 Series |
114+ | Nvidia GPU | Status | Verified Model |
115+ | --------------------------| -----------| ----------------|
116+ | Ampere Series | Supported | A100, A4000 |
117+ | Ampere Series * (Mobile)* | Supported | RTX 40 Series |
118+
119+ | AMD GPU | Status | Verified Model |
120+ | --------------------------| --------------| ----------------|
121+ | Radeon Pro | Experimental | W6800 |
122+ | Radeon RX | Experimental | 6700 XT |
123+
124+ Note: AMD GPU support is highly experimental and is incompatible with F16.
125+ Additionally, it only supports GPUs with a sub_group_size (warp size) of 32.
118126
119127## Docker
120128The docker build option is currently limited to * intel GPU* targets.
@@ -186,6 +194,10 @@ Platform #0: Intel(R) OpenCL HD Graphics
186194
187195In order to target Nvidia GPUs through SYCL, please make sure the CUDA/CUBLAS native requirements * -found [ here] ( README.md#cuda ) -* are installed.
188196
197+ - ** AMD GPU**
198+
199+ To target AMD GPUs with SYCL, the ROCm stack must be installed first.
200+
1892012 . ** Install Intel® oneAPI Base toolkit**
190202
191203- ** For Intel GPU**
@@ -212,6 +224,19 @@ cmake -B buildWithCublas -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENAB
212224cmake --build buildWithCublas --config Release
213225```
214226
227+ - ** Adding support to AMD GPUs**
228+
229+ ** oneAPI Plugin** : In order to enable SYCL support on AMD GPUs, please install the [ Codeplay oneAPI Plugin for AMD GPUs] ( https://developer.codeplay.com/products/oneapi/amd/download ) . As with Nvidia GPUs, the user should also make sure the plugin version matches the installed base toolkit.
230+
231+ ** oneMKL for rocBlas** : The current oneMKL releases * (shipped with the oneAPI base-toolkit)* doesn't contain the rocBLAS backend. A build from source of the upstream [ oneMKL] ( https://github.com/oneapi-src/oneMKL ) with the * rocBLAS* backend enabled is thus required to run it on AMD GPUs.
232+
233+ ``` sh
234+ git clone https://github.com/oneapi-src/oneMKL
235+ cd oneMKL
236+ # Find your HIPTARGET with rocminfo, under the key 'Name:'
237+ cmake -B buildWithrocBLAS -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_ROCBLAS_BACKEND=ON -DHIPTARGETS=${HIPTARGET} -DTARGET_DOMAINS=blas
238+ cmake --build buildWithrocBLAS --config Release
239+ ```
215240
2162413 . ** Verify installation and environment**
217242
@@ -223,22 +248,32 @@ sycl-ls
223248
224249- ** Intel GPU**
225250
226- When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` ext_oneapi_level_zero :gpu:0 ` ] in the sample output below:
251+ When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [ ` level_zero :gpu` ] in the sample output below:
227252
228253```
229- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
230- [opencl:cpu:1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
231- [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
232- [ext_oneapi_level_zero :gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
254+ [opencl:acc][opencl :0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
255+ [opencl:cpu][opencl :1] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-13700K OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
256+ [opencl:gpu][opencl :2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [23.30.26918.50]
257+ [level_zero :gpu][level_zero :0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26918]
233258```
234259
235260- ** Nvidia GPU**
236261
237- Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` ext_oneapi_cuda:gpu ` ] as bellow:
262+ Similarly, user targeting Nvidia GPUs should expect at least one SYCL-CUDA device [ ` cuda:gpu ` ] as below:
263+
264+ ```
265+ [opencl:acc][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
266+ [opencl:cpu][opencl:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
267+ [cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.5]
268+ ```
269+
270+ - ** AMD GPU**
271+
272+ For AMD GPUs we should expect at least one SYCL-HIP device [ ` hip:gpu ` ] :
273+
238274```
239- [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
240- [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
241- [ext_oneapi_cuda:gpu:0] NVIDIA CUDA BACKEND, NVIDIA A100-PCIE-40GB 8.0 [CUDA 12.2]
275+ [opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i9-12900K OpenCL 3.0 (Build 0) [2024.18.6.0.02_160000]
276+ [hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon PRO W6800 gfx1030 [HIP 60140.9]
242277```
243278
244279### II. Build llama.cpp
@@ -266,6 +301,7 @@ cmake --build build --config Release -j -v
266301```
267302
268303#### Nvidia GPU
304+
269305``` sh
270306# Export relevant ENV variables
271307export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithCublas/lib:$LD_LIBRARY_PATH
@@ -283,7 +319,25 @@ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=NVIDIA -DCMAKE_C_COMPILER=icx -
283319
284320# build all binary
285321cmake --build build --config Release -j -v
322+ ```
323+
324+ #### AMD GPU
286325
326+ ``` sh
327+ # Export relevant ENV variables
328+ export LD_LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LD_LIBRARY_PATH
329+ export LIBRARY_PATH=/path/to/oneMKL/buildWithrocBLAS/lib:$LIBRARY_PATH
330+ export CPLUS_INCLUDE_DIR=/path/to/oneMKL/buildWithrocBLAS/include:$CPLUS_INCLUDE_DIR
331+
332+ # Build LLAMA with rocBLAS acceleration through SYCL
333+
334+ # # AMD
335+ # Use FP32, FP16 is not supported
336+ # Find your GGML_SYCL_HIP_TARGET with rocminfo, under the key 'Name:'
337+ cmake -B build -DGGML_SYCL=ON -DGGML_SYCL_TARGET=AMD -DGGML_SYCL_HIP_TARGET=${GGML_SYCL_HIP_TARGET} -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx
338+
339+ # build all binary
340+ cmake --build build --config Release -j -v
287341```
288342
289343### III. Run the inference
@@ -586,11 +640,11 @@ use 1 SYCL GPUs: [0] with Max compute units:512
586640
587641#### Build
588642
589- | Name | Value | Function |
590- | --------------------| -----------------------------------| ---------------------------------------------|
591- | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
592- | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA | Set the SYCL target device type. |
593- | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
643+ | Name | Value | Function |
644+ | --------------------| --------------------------------------- | ---------------------------------------------|
645+ | GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br >FP32 path - recommended for better perforemance than FP16 on quantized model|
646+ | GGML_SYCL_TARGET | INTEL * (default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
647+ | GGML_SYCL_F16 | OFF * (default)* \| ON * (optional)* | Enable FP16 build with SYCL code path. |
594648| CMAKE_C_COMPILER | ` icx ` * (Linux)* , ` icx/cl ` * (Windows)* | Set ` icx ` compiler for SYCL code path. |
595649| CMAKE_CXX_COMPILER | ` icpx ` * (Linux)* , ` icx ` * (Windows)* | Set ` icpx/icx ` compiler for SYCL code path. |
596650
0 commit comments