Skip to content

Commit 2fd98ef

Browse files
Update README.md
1 parent 16f5114 commit 2fd98ef

File tree

1 file changed

+20
-1
lines changed

1 file changed

+20
-1
lines changed

README.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
<strong>Llama3</strong> models written in <strong>native Java</strong> automatically accelerated on GPUs with <a href="https://github.com/beehive-lab/TornadoVM" target="_blank"><strong>TornadoVM</strong></a>.
1818
Runs Llama3 inference efficiently using TornadoVM's GPU acceleration.
1919
<br><br>
20-
Currently, supports <strong>Llama3</strong> and <strong>Mistral</strong> models in the GGUF format.
20+
Currently, supports <strong>Llama3</strong>, <strong>Mistral</strong> and, <strong>Qwen3</strong> models in the GGUF format.
2121
<br><br>
2222
Builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a> by <a href="https://github.com/mukel">Alfonso² Peterssen</a>.
2323
Previous integration of TornadoVM and Llama2 it can be found in <a href="https://github.com/mikepapadim/llama2.tornadovm.java">llama2.tornadovm</a>.
@@ -187,6 +187,7 @@ llama-tornado --gpu --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "te
187187
-Dtornado.load.tornado.implementation=uk.ac.manchester.tornado.runtime.common.Tornado \
188188
-Dtornado.load.annotation.implementation=uk.ac.manchester.tornado.annotation.ASMClassVisitor \
189189
-Dtornado.load.annotation.parallel=uk.ac.manchester.tornado.api.annotations.Parallel \
190+
-Dtornado.tvm.maxbytecodesize=65536 \
190191
-Duse.tornadovm=true \
191192
-Dtornado.threadInfo=false \
192193
-Dtornado.debug=false \
@@ -237,6 +238,12 @@ Download `FP16` quantized `Llama-3` .gguf files from:
237238
Download `FP16` quantized `Mistral` .gguf files from:
238239
- https://huggingface.co/collections/beehive-lab/mistral-gpullama3java-684afabb206136d2e9cd47e0
239240

241+
Download `FP16` quantized `Qwen3` .gguf files from:
242+
- https://huggingface.co/ggml-org/Qwen3-0.6B-GGUF
243+
- https://huggingface.co/ggml-org/Qwen3-1.7B-GGUF
244+
- https://huggingface.co/ggml-org/Qwen3-4B-GGUF
245+
- https://huggingface.co/ggml-org/Qwen3-8B-GGUF
246+
240247
Please be gentle with [huggingface.co](https://huggingface.co) servers:
241248

242249
**Note** FP16 models are first-class citizens for the current version.
@@ -252,6 +259,18 @@ wget https://huggingface.co/beehive-lab/Llama-3.2-8B-Instruct-GGUF-FP16/resolve/
252259
253260
# Mistral (7B) - FP16
254261
wget https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/resolve/main/Mistral-7B-Instruct-v0.3.fp16.gguf
262+
263+
# Qwen3 (0.6B) - FP16
264+
wget https://huggingface.co/ggml-org/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-f16.gguf
265+
266+
# Qwen3 (1.7B) - FP16
267+
wget https://huggingface.co/ggml-org/Qwen3-0.6B-GGUF/resolve/main/Qwen3-1.7B-f16.gguf
268+
269+
# Qwen3 (4B) - FP16
270+
wget https://huggingface.co/ggml-org/Qwen3-0.6B-GGUF/resolve/main/Qwen3-4B-f16.gguf
271+
272+
# Qwen3 (8B) - FP16
273+
wget https://huggingface.co/ggml-org/Qwen3-0.6B-GGUF/resolve/main/Qwen3-8B-f16.gguf
255274
```
256275

257276
**[Experimental]** you can download the Q8 and Q4 used in the original implementation of Llama3.java, but for now are going to be dequanted to FP16 for TornadoVM support:

0 commit comments

Comments
 (0)