Update README

orionpapadakis · orionpapadakis · commit 2371a7c77088 · 2025-06-12T19:54:57.000+03:00
diff --git a/README.md b/README.md
@@ -164,11 +164,14 @@ Check models below.
 
 ## Download Model Files
 
-Download `FP16` quantized .gguf files from:
+Download `FP16` quantized `Llama-3` .gguf files from:
 - https://huggingface.co/beehive-lab/Llama-3.2-1B-Instruct-GGUF-FP16
 - https://huggingface.co/beehive-lab/Llama-3.2-3B-Instruct-GGUF-FP16
 - https://huggingface.co/beehive-lab/Llama-3.2-8B-Instruct-GGUF-FP16
 
+Download `FP16` quantized `Mistral` .gguf files from:
+- https://huggingface.co/collections/beehive-lab/mistral-gpullama3java-684afabb206136d2e9cd47e0
+
 Please be gentle with [huggingface.co](https://huggingface.co) servers:
 
 **Note** FP16 models are first-class citizens for the current version.
@@ -181,6 +184,9 @@ wget https://huggingface.co/beehive-lab/Llama-3.2-3B-Instruct-GGUF-FP16/resolve/
 
 # Llama 3 (8B) - FP16 
 wget https://huggingface.co/beehive-lab/Llama-3.2-8B-Instruct-GGUF-FP16/resolve/main/beehive-llama-3.2-8b-instruct-fp16.gguf
+
+# Mistral (7B) - FP16
+wget https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/resolve/main/Mistral-7B-Instruct-v0.3.fp16.gguf
 ```
 
 **[Experimental]** you can download the Q8 and Q4 used in the original implementation of Llama3.java, but for now are going to be dequanted to FP16 for TornadoVM support:
@@ -201,7 +207,7 @@ curl -L -O https://huggingface.co/mukel/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/
 
 ## Running `llama-tornado`
 
-To execute Llama3 models with TornadoVM on GPUs use the `llama-tornado` script with the `--gpu` flag.
+To execute Llama3, or Mistral models with TornadoVM on GPUs use the `llama-tornado` script with the `--gpu` flag.
 
 ### Usage Examples
 
@@ -246,11 +252,11 @@ First, check your GPU specifications. If your GPU has high memory capacity, you
 
 ### GPU Memory Requirements by Model Size
 
-| Model Size | Recommended GPU Memory |
-|------------|------------------------|
-| 1B models  | 7GB (default)          |
-| 3B models  | 15GB                   |
-| 8B models  | 20GB+                  |
+| Model Size  | Recommended GPU Memory |
+|-------------|------------------------|
+| 1B models   | 7GB (default)          |
+| 3-7B models | 15GB                   |
+| 8B models   | 20GB+                  |
 
 **Note**: If you still encounter memory issues, try:
 
@@ -288,6 +294,7 @@ LLaMA Configuration:
                         Maximum number of tokens to generate (default: 512)
   --stream STREAM       Enable streaming output (default: True)
   --echo ECHO           Echo the input prompt (default: False)
+  --suffix SUFFIX       Suffix for fill-in-the-middle request (Codestral) (default: None)
 
 Mode Selection:
   -i, --interactive     Run in interactive/chat mode (default: False)