Skip to content

Commit 2371a7c

Browse files
Update README
1 parent 3b4bb62 commit 2371a7c

File tree

1 file changed

+14
-7
lines changed

1 file changed

+14
-7
lines changed

README.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -164,11 +164,14 @@ Check models below.
164164

165165
## Download Model Files
166166

167-
Download `FP16` quantized .gguf files from:
167+
Download `FP16` quantized `Llama-3` .gguf files from:
168168
- https://huggingface.co/beehive-lab/Llama-3.2-1B-Instruct-GGUF-FP16
169169
- https://huggingface.co/beehive-lab/Llama-3.2-3B-Instruct-GGUF-FP16
170170
- https://huggingface.co/beehive-lab/Llama-3.2-8B-Instruct-GGUF-FP16
171171

172+
Download `FP16` quantized `Mistral` .gguf files from:
173+
- https://huggingface.co/collections/beehive-lab/mistral-gpullama3java-684afabb206136d2e9cd47e0
174+
172175
Please be gentle with [huggingface.co](https://huggingface.co) servers:
173176

174177
**Note** FP16 models are first-class citizens for the current version.
@@ -181,6 +184,9 @@ wget https://huggingface.co/beehive-lab/Llama-3.2-3B-Instruct-GGUF-FP16/resolve/
181184
182185
# Llama 3 (8B) - FP16
183186
wget https://huggingface.co/beehive-lab/Llama-3.2-8B-Instruct-GGUF-FP16/resolve/main/beehive-llama-3.2-8b-instruct-fp16.gguf
187+
188+
# Mistral (7B) - FP16
189+
wget https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/resolve/main/Mistral-7B-Instruct-v0.3.fp16.gguf
184190
```
185191

186192
**[Experimental]** you can download the Q8 and Q4 used in the original implementation of Llama3.java, but for now are going to be dequanted to FP16 for TornadoVM support:
@@ -201,7 +207,7 @@ curl -L -O https://huggingface.co/mukel/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/
201207

202208
## Running `llama-tornado`
203209

204-
To execute Llama3 models with TornadoVM on GPUs use the `llama-tornado` script with the `--gpu` flag.
210+
To execute Llama3, or Mistral models with TornadoVM on GPUs use the `llama-tornado` script with the `--gpu` flag.
205211

206212
### Usage Examples
207213

@@ -246,11 +252,11 @@ First, check your GPU specifications. If your GPU has high memory capacity, you
246252

247253
### GPU Memory Requirements by Model Size
248254

249-
| Model Size | Recommended GPU Memory |
250-
|------------|------------------------|
251-
| 1B models | 7GB (default) |
252-
| 3B models | 15GB |
253-
| 8B models | 20GB+ |
255+
| Model Size | Recommended GPU Memory |
256+
|-------------|------------------------|
257+
| 1B models | 7GB (default) |
258+
| 3-7B models | 15GB |
259+
| 8B models | 20GB+ |
254260

255261
**Note**: If you still encounter memory issues, try:
256262

@@ -288,6 +294,7 @@ LLaMA Configuration:
288294
Maximum number of tokens to generate (default: 512)
289295
--stream STREAM Enable streaming output (default: True)
290296
--echo ECHO Echo the input prompt (default: False)
297+
--suffix SUFFIX Suffix for fill-in-the-middle request (Codestral) (default: None)
291298

292299
Mode Selection:
293300
-i, --interactive Run in interactive/chat mode (default: False)

0 commit comments

Comments
 (0)