You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: SECURITY.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,8 @@ To protect sensitive data from potential leaks or unauthorized access, it is cru
40
40
### Untrusted environments or networks
41
41
42
42
If you can't run your models in a secure and isolated environment or if it must be exposed to an untrusted network, make sure to take the following security precautions:
43
-
* Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value
43
+
* Do not use the RPC backend, [rpc-server](https://github.com/ggml-org/llama.cpp/tree/master/examples/rpc) and [llama-server](https://github.com/ggml-org/llama.cpp/tree/master/examples/server) functionality (see https://github.com/ggml-org/llama.cpp/pull/13061).
44
+
* Confirm the hash of any downloaded artifact (e.g. pre-trained model weights) matches a known-good value.
44
45
* Encrypt your data if sending it over the network.
Copy file name to clipboardExpand all lines: docs/multimodal/MobileVLM.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,15 +9,15 @@ The implementation is based on llava, and is compatible with llava and mobileVLM
9
9
Notice: The overall process of model inference for both **MobileVLM** and **MobileVLM_V2** models is the same, but the process of model conversion is a little different. Therefore, using **MobileVLM-1.7B** as an example, the different conversion step will be shown.
10
10
11
11
## Usage
12
-
Build with cmake or run `make llama-llava-cli` to build it.
13
12
14
-
After building, run: `./llama-llava-cli` to see the usage. For example:
13
+
Build the `llama-mtmd-cli` binary.
14
+
15
+
After building, run: `./llama-mtmd-cli` to see the usage. For example:
-p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWho is the author of this book? Answer the question using a single word or phrase. ASSISTANT:"
20
+
--chat-template deepseek
21
21
```
22
22
23
23
## Model conversion
@@ -82,7 +82,7 @@ refer to `android/adb_run.sh`, modify resources' `name` and `path`
82
82
### case 1
83
83
**input**
84
84
```sh
85
-
/data/local/tmp/llama-llava-cli \
85
+
/data/local/tmp/llama-mtmd-cli \
86
86
-m /data/local/tmp/ggml-model-q4_k.gguf \
87
87
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
88
88
-t 4 \
@@ -102,7 +102,7 @@ llama_print_timings: total time = 34731.93 ms
102
102
### case 2
103
103
**input**
104
104
```sh
105
-
/data/local/tmp/llama-llava-cli \
105
+
/data/local/tmp/llama-mtmd-cli \
106
106
-m /data/local/tmp/ggml-model-q4_k.gguf \
107
107
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
108
108
-t 4 \
@@ -123,10 +123,10 @@ llama_print_timings: total time = 34570.79 ms
123
123
124
124
## Some result on Android with `Snapdragon 778G` chip
125
125
### MobileVLM-1.7B case
126
-
#### llava-cli release-b2005
126
+
#### mtmd-cli release-b2005
127
127
**input**
128
128
```sh
129
-
/data/local/tmp/llama-llava-cli \
129
+
/data/local/tmp/llama-mtmd-cli \
130
130
-m /data/local/tmp/ggml-model-q4_k.gguf \
131
131
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
132
132
-t 4 \
@@ -147,7 +147,7 @@ llama_print_timings: prompt eval time = 8119.49 ms / 191 tokens ( 42.51 m
147
147
llama_print_timings: evaltime = 1005.75 ms / 14 runs ( 71.84 ms per token, 13.92 tokens per second)
148
148
llama_print_timings: total time = 28038.34 ms / 205 tokens
149
149
```
150
-
#### llava-cli latest-version
150
+
#### mtmd-cli latest-version
151
151
**input**
152
152
153
153
Just the same as above.
@@ -169,7 +169,7 @@ llama_print_timings: eval time = 43894.02 ms / 13 runs ( 3376.46 m
169
169
llama_print_timings: total time = 865441.76 ms / 204 tokens
170
170
```
171
171
### MobileVLM_V2-1.7B case
172
-
#### llava-cli release-2005b
172
+
#### mtmd-cli release-2005b
173
173
**input**
174
174
175
175
Just the same as above.
@@ -200,7 +200,7 @@ make GGML_CUDA=1 CUDA_DOCKER_ARCH=sm_87 GGML_CUDA_F16=1 -j 32
200
200
### case 1
201
201
**input**
202
202
```sh
203
-
./llama-llava-cli \
203
+
./llama-mtmd-cli \
204
204
-m /data/local/tmp/ggml-model-q4_k.gguf \
205
205
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
206
206
--image /data/local/tmp/demo.jpeg \
@@ -224,7 +224,7 @@ llama_print_timings: total time = 1352.63 ms / 252 tokens
224
224
### case 2
225
225
**input**
226
226
```sh
227
-
./llama-llava-cli \
227
+
./llama-mtmd-cli \
228
228
-m /data/local/tmp/ggml-model-q4_k.gguf \
229
229
--mmproj /data/local/tmp/mmproj-model-f16.gguf \
230
230
-p "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\nWhat is in the image? ASSISTANT:" \
Copy file name to clipboardExpand all lines: docs/multimodal/glmedge.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,12 +3,12 @@
3
3
Currently this implementation supports [glm-edge-v-2b](https://huggingface.co/THUDM/glm-edge-v-2b) and [glm-edge-v-5b](https://huggingface.co/THUDM/glm-edge-v-5b).
4
4
5
5
## Usage
6
-
Build with cmake or run `make llama-llava-cli`to build it.
6
+
Build the `llama-mtmd-cli`binary.
7
7
8
-
After building, run: `./llama-llava-cli` to see the usage. For example:
8
+
After building, run: `./llama-mtmd-cli` to see the usage. For example:
Copy file name to clipboardExpand all lines: docs/multimodal/granitevision.md
+2-6Lines changed: 2 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -176,15 +176,11 @@ Note that currently you cannot quantize the visual encoder because granite visio
176
176
177
177
178
178
### 5. Running the Model in Llama cpp
179
-
Build llama cpp normally; you should have a target binary named `llama-llava-cli`, which you can pass two binaries to. As an example, we pass the the llama.cpp banner.
179
+
Build llama cpp normally; you should have a target binary named `llama-mtmd-cli`, which you can pass two binaries to. As an example, we pass the the llama.cpp banner.
180
180
181
181
```bash
182
-
$ ./build/bin/llama-llava-cli -m $LLM_GGUF_PATH \
182
+
$ ./build/bin/llama-mtmd-cli -m $LLM_GGUF_PATH \
183
183
--mmproj $VISUAL_GGUF_PATH \
184
-
--image ./media/llama0-banner.png \
185
184
-c 16384 \
186
-
-p "<|system|>\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n<|user|>\n\<image>\nWhat does the text in this image say?\n<|assistant|>\n" \
187
185
--temp 0
188
186
```
189
-
190
-
Sample output: `The text in the image reads "LLAMA C++ Can it run DOOM Llama?"`
0 commit comments