You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feature : support blis and other blas implementation (ggml-org#1536)
* feature: add blis support
* feature: allow all BLA_VENDOR to be assigned in cmake arguments. align with whisper.cpp pr 927
* fix: version detection for BLA_SIZEOF_INTEGER, recover min version of cmake
* Fix typo in INTEGER
Co-authored-by: Georgi Gerganov <[email protected]>
* Fix: blas changes on ci
---------
Co-authored-by: Georgi Gerganov <[email protected]>
BLIS is a portable software framework for high-performance BLAS-like dense linear algebra libraries. It has received awards and recognition, including the 2023 James H. Wilkinson Prize for Numerical Software and the 2020 SIAM Activity Group on Supercomputing Best Paper Prize. BLIS provides a new BLAS-like API and a compatibility layer for traditional BLAS routine calls. It offers features such as object-based API, typed API, BLAS and CBLAS compatibility layers.
5
+
6
+
Project URL: https://github.com/flame/blis
7
+
8
+
### Prepare:
9
+
10
+
Compile BLIS:
11
+
12
+
```bash
13
+
git clone https://github.com/flame/blis
14
+
cd blis
15
+
./configure --enable-cblas -t openmp,pthreads auto
16
+
# will install to /usr/local/ by default.
17
+
make -j
18
+
```
19
+
20
+
Install BLIS:
21
+
22
+
```bash
23
+
sudo make install
24
+
```
25
+
26
+
We recommend using openmp since it's easier to modify the cores been used.
According to the BLIS documentation, we could set the following
49
+
environment variables to modify the behavior of openmp:
50
+
51
+
```
52
+
export GOMP_GPU_AFFINITY="0-19"
53
+
export BLIS_NUM_THREADS=14
54
+
```
55
+
56
+
And then run the binaries as normal.
57
+
58
+
59
+
### Intel specific issue
60
+
61
+
Some might get the error message saying that `libimf.so` cannot be found.
62
+
Please follow this [stackoverflow page](https://stackoverflow.com/questions/70687930/intel-oneapi-2022-libimf-so-no-such-file-or-directory-during-openmpi-compila).
Copy file name to clipboardExpand all lines: README.md
+17-2Lines changed: 17 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,7 +56,7 @@ The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quant
56
56
- Mixed F16 / F32 precision
57
57
- 4-bit, 5-bit and 8-bit integer quantization support
58
58
- Runs on the CPU
59
-
- OpenBLAS support
59
+
-Supports OpenBLAS/Apple BLAS/ARM Performance Lib/ATLAS/BLIS/Intel MKL/NVHPC/ACML/SCSL/SGIMATH and [more](https://cmake.org/cmake/help/latest/module/FindBLAS.html#blas-lapack-vendors) in BLAS
60
60
- cuBLAS and CLBlast support
61
61
62
62
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
@@ -274,10 +274,25 @@ Building the program with BLAS support may lead to some performance improvements
By default, `LLAMA_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DLLAMA_BLAS=ON`in cmake, the mkl version of Blas will automatically been selected. You may also specify it by:
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager or from here: [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads).
0 commit comments