@@ -24,7 +24,7 @@ Some of the development is currently happening in the [llama.cpp](https://github
2424
2525- [X] Example of GPT-2 inference [ examples/gpt-2] ( https://github.com/ggerganov/ggml/tree/master/examples/gpt-2 )
2626- [X] Example of GPT-J inference [ examples/gpt-j] ( https://github.com/ggerganov/ggml/tree/master/examples/gpt-j )
27- - [X] Example of Whisper inference [ examples /whisper] ( https://github.com/ggerganov/ggml/tree/master/examples/ whisper )
27+ - [X] Example of Whisper inference [ ggerganov /whisper.cpp ] ( https://github.com/ggerganov/whisper.cpp )
2828- [X] Example of LLaMA inference [ ggerganov/llama.cpp] ( https://github.com/ggerganov/llama.cpp )
2929- [X] Example of LLaMA training [ ggerganov/llama.cpp/examples/baby-llama] ( https://github.com/ggerganov/llama.cpp/tree/master/examples/baby-llama )
3030- [X] Example of Falcon inference [ cmp-nct/ggllm.cpp] ( https://github.com/cmp-nct/ggllm.cpp )
@@ -44,20 +44,6 @@ Some of the development is currently happening in the [llama.cpp](https://github
4444- [X] Example of multiple LLMs inference [ foldl/chatllm.cpp] ( https://github.com/foldl/chatllm.cpp )
4545- [X] SeamlessM4T inference * (in development)* https://github.com/facebookresearch/seamless_communication/tree/main/ggml
4646
47- ## Whisper inference (example)
48-
49- With ggml you can efficiently run [ Whisper] ( examples/whisper ) inference on the CPU.
50-
51- Memory requirements:
52-
53- | Model | Disk | Mem |
54- | --- | --- | --- |
55- | tiny | 75 MB | ~ 280 MB |
56- | base | 142 MB | ~ 430 MB |
57- | small | 466 MB | ~ 1.0 GB |
58- | medium | 1.5 GB | ~ 2.6 GB |
59- | large | 2.9 GB | ~ 4.7 GB |
60-
6147## GPT inference (example)
6248
6349With ggml you can efficiently run [ GPT-2] ( examples/gpt-2 ) and [ GPT-J] ( examples/gpt-j ) inference on the CPU.
@@ -128,11 +114,6 @@ cmake -DGGML_CUBLAS=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc ..
128114cmake -DCMAKE_C_COMPILER=" $( hipconfig -l) /clang" -DCMAKE_CXX_COMPILER=" $( hipconfig -l) /clang++" -DGGML_HIPBLAS=ON
129115```
130116
131- ## Using clBLAST
132-
133- ``` bash
134- cmake -DGGML_CLBLAST=ON ..
135- ```
136117## Compiling for Android
137118
138119Download and unzip the NDK from this download [ page] ( https://developer.android.com/ndk/downloads ) . Set the NDK_ROOT_PATH environment variable or provide the absolute path to the CMAKE_ANDROID_NDK in the command below.
@@ -170,64 +151,6 @@ export LD_LIBRARY_PATH=/data/local/tmp
170151./bin/gpt-2-backend -m models/ggml-model.bin -p " this is an example"
171152```
172153
173- ### CLBlast for Android
174-
175- Build CLBlast.
176-
177- ``` bash
178- # In CLBlast/build
179- $ANDROID_SDK_PATH /cmake/3.22.1/bin/cmake .. \
180- -DCMAKE_SYSTEM_NAME=Android \
181- -DCMAKE_SYSTEM_VERSION=33 \
182- -DCMAKE_ANDROID_ARCH_ABI=arm64-v8a \
183- -DCMAKE_ANDROID_NDK=$ANDROID_NDK_PATH \
184- -DCMAKE_ANDROID_STL_TYPE=c++_static \
185- -DOPENCL_ROOT=$( readlink -f ../../OpenCL-Headers) \
186- -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=BOTH \
187- -DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH
188-
189- # Build libclblast.so
190- make -j4
191- ```
192-
193- Pull ` libGLES_mali.so ` to ` libOpenCL.so ` .
194-
195- ``` bash
196- # In ggml project root.
197- mkdir arm64-v8a
198- adb pull /system/vendor/lib64/egl/libGLES_mali.so arm64-v8a/libOpenCL.so
199- ```
200-
201- Build ggml with CLBlast.
202-
203- ``` bash
204- # In ggml/build
205- cd build
206- $ANDROID_SDK_PATH /cmake/3.22.1/bin/cmake .. \
207- -DGGML_CLBLAST=ON \
208- -DCMAKE_SYSTEM_NAME=Android \
209- -DCMAKE_SYSTEM_VERSION=33 \
210- -DCMAKE_ANDROID_ARCH_ABI=arm64-v8a \
211- -DCMAKE_ANDROID_NDK=$ANDROID_NDK_PATH \
212- -DCMAKE_ANDROID_STL_TYPE=c++_shared \
213- -DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH \
214- -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=BOTH \
215- -DCLBLAST_HOME=$( readlink -f ../../CLBlast) \
216- -DOPENCL_LIB=$( readlink -f ../arm64-v8a/libOpenCL.so)
217-
218- # Run make, adb push, etc.
219- ```
220-
221- Then in ` adb shell ` ...
222-
223- ``` bash
224- cd /data/local/tmp
225- export LD_LIBRARY_PATH=/system/vendor/lib64/egl:/data/local/tmp
226- ./bin/gpt-2-backend -m models/ggml-model.bin -n 64 -p " Pepperoni pizza"
227- ```
228-
229- OpenCL does not have the same level of support in ` ggml-backend ` as CUDA or Metal. In the ` gpt-2-backend ` example, OpenCL will only be used for the matrix multiplications when evaluating large prompts.
230-
231154## Resources
232155
233156- [ GGML - Large Language Models for Everyone] ( https://github.com/rustformers/llm/blob/main/crates/ggml/README.md ) : a description of the GGML format provided by the maintainers of the ` llm ` Rust crate, which provides Rust bindings for GGML
0 commit comments