File tree Expand file tree Collapse file tree 1 file changed +0
-35
lines changed Expand file tree Collapse file tree 1 file changed +0
-35
lines changed Original file line number Diff line number Diff line change @@ -31,42 +31,7 @@ Add proguard rule if it's enabled in project (android/app/proguard-rules.pro):
3131
3232You can search HuggingFace for available models (Keyword: [ ` GGUF ` ] ( https://huggingface.co/search/full-text?q=GGUF&type=model ) ).
3333
34- <<<<<<< Updated upstream
35- For create a GGUF model manually, for example in Llama 2:
36-
37- Download the Llama 2 model
38-
39- 1 . Request access from [ here] ( https://ai.meta.com/llama )
40- 2 . Download the model from HuggingFace [ here] ( https://huggingface.co/meta-llama/Llama-2-7b-chat ) (` Llama-2-7b-chat ` )
41-
42- Convert the model to ggml format
43-
44- ``` bash
45- # Start with submodule in this repo (or you can clone the repo https://github.com/ggerganov/llama.cpp.git)
46- yarn && yarn bootstrap
47- cd llama.cpp
48-
49- # install Python dependencies
50- python3 -m pip install -r requirements.txt
51-
52- # Move the Llama model weights to the models folder
53- mv < path to Llama-2-7b-chat> ./models/7B
54-
55- # convert the 7B model to ggml FP16 format
56- python3 convert.py models/7B/ --outtype f16
57-
58- # Build the quantize tool
59- make quantize
60-
61- # quantize the model to 2-bits (using q2_k method)
62- ./quantize ./models/7B/ggml-model-f16.gguf ./models/7B/ggml-model-q2_k.gguf q2_k
63-
64- # quantize the model to 4-bits (using q4_0 method)
65- ./quantize ./models/7B/ggml-model-f16.gguf ./models/7B/ggml-model-q4_0.gguf q4_0
66- ```
67- =======
6834For get a GGUF model or quantize manually, see [ ` Prepare and Quantize ` ] ( https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize ) section in llama.cpp.
69- >>>>>>> Stashed changes
7035
7136## Usage
7237
You can’t perform that action at this time.
0 commit comments