Skip to content

Commit 98b204c

Browse files
authored
Merge branch 'ggerganov:master' into master
2 parents dbe9ef7 + 6374743 commit 98b204c

File tree

10 files changed

+361
-136
lines changed

10 files changed

+361
-136
lines changed

docs/android.md

Lines changed: 55 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2,55 +2,82 @@
22
# Android
33

44
## Build on Android using Termux
5-
[Termux](https://github.com/termux/termux-app#installation) is a method to execute `llama.cpp` on an Android device (no root required).
5+
6+
[Termux](https://termux.dev/en/) is an Android terminal emulator and Linux environment app (no root required). As of writing, Termux is available experimentally in the Google Play Store; otherwise, it may be obtained directly from the project repo or on F-Droid.
7+
8+
With Termux, you can install and run `llama.cpp` as if the environment were Linux. Once in the Termux shell:
9+
10+
```
11+
$ apt update && apt upgrade -y
12+
$ apt install git cmake
13+
```
14+
15+
Then, follow the [build instructions](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md), specifically for CMake.
16+
17+
Once the binaries are built, download your model of choice (e.g., from Hugging Face). It's recommended to place it in the `~/` directory for best performance:
18+
619
```
7-
apt update && apt upgrade -y
8-
apt install git make cmake
20+
$ curl -L {model-url} -o ~/{model}.gguf
921
```
1022

11-
It's recommended to move your model inside the `~/` directory for best performance:
23+
Then, if you are not already in the repo directory, `cd` into `llama.cpp` and:
24+
1225
```
13-
cd storage/downloads
14-
mv model.gguf ~/
26+
$ ./build/bin/llama-simple -m ~/{model}.gguf -c {context-size} -p "{your-prompt}"
1527
```
1628

17-
[Get the code](https://github.com/ggerganov/llama.cpp#get-the-code) & [follow the Linux build instructions](https://github.com/ggerganov/llama.cpp#build) to build `llama.cpp`.
29+
Here, we show `llama-simple`, but any of the executables under `examples` should work, in theory. Be sure to set `context-size` to a reasonable number (say, 4096) to start with; otherwise, memory could spike and kill your terminal.
30+
31+
To see what it might look like visually, here's an old demo of an interactive session running on a Pixel 5 phone:
32+
33+
https://user-images.githubusercontent.com/271616/225014776-1d567049-ad71-4ef2-b050-55b0b3b9274c.mp4
34+
35+
## Cross-compile using Android NDK
36+
It's possible to build `llama.cpp` for Android on your host system via CMake and the Android NDK. If you are interested in this path, ensure you already have an environment prepared to cross-compile programs for Android (i.e., install the Android SDK). Note that, unlike desktop environments, the Android environment ships with a limited set of native libraries, and so only those libraries are available to CMake when building with the Android NDK (see: https://developer.android.com/ndk/guides/stable_apis.)
1837

19-
## Building the Project using Android NDK
20-
Obtain the [Android NDK](https://developer.android.com/ndk) and then build with CMake.
38+
Once you're ready and have cloned `llama.cpp`, invoke the following in the project directory:
2139

22-
Execute the following commands on your computer to avoid downloading the NDK to your mobile. Alternatively, you can also do this in Termux:
2340
```
24-
$ mkdir build-android
25-
$ cd build-android
26-
$ export NDK=<your_ndk_directory>
27-
$ cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
28-
$ make
41+
$ cmake \
42+
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
43+
-DANDROID_ABI=arm64-v8a \
44+
-DANDROID_PLATFORM=android-28 \
45+
-DCMAKE_C_FLAGS="-march=armv8.7a" \
46+
-DCMAKE_CXX_FLAGS="-march=armv8.7a" \
47+
-DGGML_OPENMP=OFF \
48+
-DGGML_LLAMAFILE=OFF \
49+
-B build-android
2950
```
3051

31-
Install [termux](https://github.com/termux/termux-app#installation) on your device and run `termux-setup-storage` to get access to your SD card (if Android 11+ then run the command twice).
52+
Notes:
53+
- While later versions of Android NDK ship with OpenMP, it must still be installed by CMake as a dependency, which is not supported at this time
54+
- `llamafile` does not appear to support Android devices (see: https://github.com/Mozilla-Ocho/llamafile/issues/325)
55+
56+
The above command should configure `llama.cpp` with the most performant options for modern devices. Even if your device is not running `armv8.7a`, `llama.cpp` includes runtime checks for available CPU features it can use.
3257

33-
Finally, copy these built `llama` binaries and the model file to your device storage. Because the file permissions in the Android sdcard cannot be changed, you can copy the executable files to the `/data/data/com.termux/files/home/bin` path, and then execute the following commands in Termux to add executable permission:
58+
Feel free to adjust the Android ABI for your target. Once the project is configured:
3459

35-
(Assumed that you have pushed the built executable files to the /sdcard/llama.cpp/bin path using `adb push`)
3660
```
37-
$cp -r /sdcard/llama.cpp/bin /data/data/com.termux/files/home/
38-
$cd /data/data/com.termux/files/home/bin
39-
$chmod +x ./*
61+
$ cmake --build build-android --config Release -j{n}
62+
$ cmake --install build-android --prefix {install-dir} --config Release
4063
```
4164

42-
Download model [llama-2-7b-chat.Q4_K_M.gguf](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/blob/main/llama-2-7b-chat.Q4_K_M.gguf), and push it to `/sdcard/llama.cpp/`, then move it to `/data/data/com.termux/files/home/model/`
65+
After installing, go ahead and download the model of your choice to your host system. Then:
4366

4467
```
45-
$mv /sdcard/llama.cpp/llama-2-7b-chat.Q4_K_M.gguf /data/data/com.termux/files/home/model/
68+
$ adb shell "mkdir /data/local/tmp/llama.cpp"
69+
$ adb push {install-dir} /data/local/tmp/llama.cpp/
70+
$ adb push {model}.gguf /data/local/tmp/llama.cpp/
71+
$ adb shell
4672
```
4773

48-
Now, you can start chatting:
74+
In the `adb shell`:
75+
4976
```
50-
$cd /data/data/com.termux/files/home/bin
51-
$./llama-cli -m ../model/llama-2-7b-chat.Q4_K_M.gguf -n 128 -cml
77+
$ cd /data/local/tmp/llama.cpp
78+
$ LD_LIBRARY_PATH=lib ./bin/llama-simple -m {model}.gguf -c {context-size} -p "{your-prompt}"
5279
```
5380

54-
Here's a demo of an interactive session running on Pixel 5 phone:
81+
That's it!
5582

56-
https://user-images.githubusercontent.com/271616/225014776-1d567049-ad71-4ef2-b050-55b0b3b9274c.mp4
83+
Be aware that Android will not find the library path `lib` on its own, so we must specify `LD_LIBRARY_PATH` in order to run the installed executables. Android does support `RPATH` in later API levels, so this could change in the future. Refer to the previous section for information about `context-size` (very important!) and running other `examples`.

flake.lock

Lines changed: 10 additions & 10 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

ggml/include/ggml-backend.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ extern "C" {
170170

171171
// Functions that may be obtained using ggml_backend_reg_get_proc_address
172172
typedef ggml_backend_buffer_type_t (*ggml_backend_split_buffer_type_t)(const float *);
173+
typedef void (*ggml_backend_set_n_threads_t)(ggml_backend_t, int);
173174

174175
//
175176
// Backend registry

ggml/include/ggml-blas.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ GGML_API bool ggml_backend_is_blas(ggml_backend_t backend);
1717
// for openblas and blis, this will also set the number of threads used for blas operations
1818
GGML_API void ggml_backend_blas_set_n_threads(ggml_backend_t backend_blas, int n_threads);
1919

20+
GGML_API ggml_backend_reg_t ggml_backend_blas_reg(void);
21+
2022

2123
#ifdef __cplusplus
2224
}

ggml/src/CMakeLists.txt

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -190,22 +190,24 @@ if (GGML_BLAS)
190190
# see https://gitlab.kitware.com/cmake/cmake/-/issues/20268
191191
find_package(PkgConfig REQUIRED)
192192
if (${GGML_BLAS_VENDOR} MATCHES "Generic")
193-
pkg_check_modules(DepBLAS REQUIRED blas)
193+
pkg_check_modules(DepBLAS blas)
194194
elseif (${GGML_BLAS_VENDOR} MATCHES "OpenBLAS")
195195
# As of openblas v0.3.22, the 64-bit is named openblas64.pc
196196
pkg_check_modules(DepBLAS openblas64)
197197
if (NOT DepBLAS_FOUND)
198-
pkg_check_modules(DepBLAS REQUIRED openblas)
198+
pkg_check_modules(DepBLAS openblas)
199199
endif()
200200
elseif (${GGML_BLAS_VENDOR} MATCHES "FLAME")
201-
pkg_check_modules(DepBLAS REQUIRED blis)
201+
add_compile_definitions(GGML_BLAS_USE_BLIS)
202+
pkg_check_modules(DepBLAS blis)
202203
elseif (${GGML_BLAS_VENDOR} MATCHES "ATLAS")
203-
pkg_check_modules(DepBLAS REQUIRED blas-atlas)
204+
pkg_check_modules(DepBLAS blas-atlas)
204205
elseif (${GGML_BLAS_VENDOR} MATCHES "FlexiBLAS")
205-
pkg_check_modules(DepBLAS REQUIRED flexiblas_api)
206+
pkg_check_modules(DepBLAS flexiblas_api)
206207
elseif (${GGML_BLAS_VENDOR} MATCHES "Intel")
208+
add_compile_definitions(GGML_BLAS_USE_MKL)
207209
# all Intel* libraries share the same include path
208-
pkg_check_modules(DepBLAS REQUIRED mkl-sdl)
210+
pkg_check_modules(DepBLAS mkl-sdl)
209211
elseif (${GGML_BLAS_VENDOR} MATCHES "NVHPC")
210212
# this doesn't provide pkg-config
211213
# suggest to assign BLAS_INCLUDE_DIRS on your own
@@ -1361,6 +1363,10 @@ if (MATH_LIBRARY)
13611363
endif()
13621364
endif()
13631365

1366+
if (CMAKE_SYSTEM_NAME MATCHES "Android")
1367+
list(APPEND GGML_EXTRA_LIBS_PRIVATE dl) # Must be linked explicitly
1368+
endif()
1369+
13641370
list(REMOVE_DUPLICATES GGML_EXTRA_LIBS_PRIVATE)
13651371
list(REMOVE_DUPLICATES GGML_EXTRA_LIBS_PUBLIC)
13661372
target_link_libraries(ggml PRIVATE ${GGML_EXTRA_LIBS_PRIVATE} PUBLIC ${GGML_EXTRA_LIBS_PUBLIC})

ggml/src/ggml-backend-impl.h

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ extern "C" {
8888

8989
void (*free)(ggml_backend_t backend);
9090

91+
// Will be moved to the device interface
9192
// buffer allocation
9293
ggml_backend_buffer_type_t (*get_default_buffer_type)(ggml_backend_t backend);
9394

@@ -112,17 +113,9 @@ extern "C" {
112113

113114
// IMPORTANT: these functions have been moved to the device interface and will be removed from the backend interface
114115
// new backends should implement the device interface instead
115-
116116
// These functions are being moved to the device interface
117-
// check if the backend can compute an operation
118117
bool (*supports_op) (ggml_backend_t backend, const struct ggml_tensor * op);
119-
120-
// check if the backend can use tensors allocated in a buffer type
121118
bool (*supports_buft)(ggml_backend_t backend, ggml_backend_buffer_type_t buft);
122-
123-
// check if the backend wants to run an operation, even if the weights are allocated in a CPU buffer
124-
// these should be expensive operations with large batch sizes that may benefit from running on this backend
125-
// even if the weight has to be copied from the CPU temporarily
126119
bool (*offload_op) (ggml_backend_t backend, const struct ggml_tensor * op);
127120

128121
// (optional) event synchronization
@@ -184,9 +177,8 @@ extern "C" {
184177
// check if the backend can use tensors allocated in a buffer type
185178
bool (*supports_buft)(ggml_backend_dev_t dev, ggml_backend_buffer_type_t buft);
186179

187-
// check if the backend wants to run an operation, even if the weights are allocated in a CPU buffer
188-
// these should be expensive operations with large batch sizes that may benefit from running on this backend
189-
// even if the weight has to be copied from the CPU temporarily
180+
// (optional) check if the backend wants to run an operation, even if the weights are allocated in an incompatible buffer
181+
// these should be expensive operations that may benefit from running on this backend instead of the CPU backend
190182
bool (*offload_op)(ggml_backend_dev_t dev, const struct ggml_tensor * op);
191183

192184
// (optional) event synchronization

ggml/src/ggml-backend.cpp

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -500,7 +500,11 @@ bool ggml_backend_dev_supports_buft(ggml_backend_dev_t device, ggml_backend_buff
500500
}
501501

502502
bool ggml_backend_dev_offload_op(ggml_backend_dev_t device, const struct ggml_tensor * op) {
503-
return device->iface.offload_op(device, op);
503+
if (device->iface.offload_op != NULL) {
504+
return device->iface.offload_op(device, op);
505+
}
506+
507+
return false;
504508
}
505509

506510
// Backend (reg)
@@ -534,6 +538,10 @@ void * ggml_backend_reg_get_proc_address(ggml_backend_reg_t reg, const char * na
534538
#include "ggml-metal.h"
535539
#endif
536540

541+
#ifdef GGML_USE_BLAS
542+
#include "ggml-blas.h"
543+
#endif
544+
537545
struct ggml_backend_registry {
538546
std::vector<ggml_backend_reg_t> backends;
539547
std::vector<ggml_backend_dev_t> devices;
@@ -545,10 +553,13 @@ struct ggml_backend_registry {
545553
#ifdef GGML_USE_METAL
546554
register_backend(ggml_backend_metal_reg());
547555
#endif
548-
549-
register_backend(ggml_backend_cpu_reg());
556+
#ifdef GGML_USE_BLAS
557+
register_backend(ggml_backend_blas_reg());
558+
#endif
550559

551560
// TODO: sycl, vulkan, kompute, cann
561+
562+
register_backend(ggml_backend_cpu_reg());
552563
}
553564

554565
void register_backend(ggml_backend_reg_t reg) {
@@ -1229,16 +1240,22 @@ static ggml_backend_dev_t ggml_backend_cpu_reg_get_device(ggml_backend_reg_t reg
12291240
};
12301241

12311242
return &ggml_backend_cpu_device;
1243+
}
1244+
1245+
static void * ggml_backend_cpu_get_proc_address(ggml_backend_reg_t reg, const char * name) {
1246+
if (strcmp(name, "ggml_backend_set_n_threads") == 0) {
1247+
return (void *)ggml_backend_cpu_set_n_threads;
1248+
}
1249+
return NULL;
12321250

12331251
GGML_UNUSED(reg);
1234-
GGML_UNUSED(index);
12351252
}
12361253

12371254
static const struct ggml_backend_reg_i ggml_backend_cpu_reg_i = {
12381255
/* .get_name = */ ggml_backend_cpu_reg_get_name,
12391256
/* .get_device_count = */ ggml_backend_cpu_reg_get_device_count,
12401257
/* .get_device = */ ggml_backend_cpu_reg_get_device,
1241-
/* .get_proc_address = */ NULL,
1258+
/* .get_proc_address = */ ggml_backend_cpu_get_proc_address,
12421259
};
12431260

12441261
ggml_backend_reg_t ggml_backend_cpu_reg(void) {

0 commit comments

Comments
 (0)