Skip to content

Commit b4985fc

Browse files
authored
Merge pull request #36 from mikepapadim/feat/readme_upd
Update README for clarity
2 parents 78ae1ae + 2d7756d commit b4985fc

File tree

1 file changed

+69
-73
lines changed

1 file changed

+69
-73
lines changed

README.md

Lines changed: 69 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,12 @@
1515
</td>
1616
<td style="vertical-align: middle; padding-left: 20px; border: none;">
1717
<strong>Llama3</strong> models written in <strong>native Java</strong> automatically accelerated on GPUs with <a href="https://github.com/beehive-lab/TornadoVM" target="_blank"><strong>TornadoVM</strong></a>.
18-
This project allows you to run Llama3 inference efficiently, leveraging TornadoVM's parallel computing features for enhanced performance.
19-
20-
<br><br>
21-
Builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a>, based on the original <a href="https://github.com/meta-llama/llama3">Llama 3</a>, <a href="https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1">3.1</a>, and <a href="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/">3.2</a> models, with TornadoVM support for parallelism and hardware acceleration.
18+
Runs Llama3 inference efficiently using TornadoVM's GPU acceleration.
2219
<br><br>
23-
Thanks to <a href="https://github.com/mukel">Alfonso² Peterssen</a> for the original implementation of Llama3.java.
20+
Currently, supports <strong>Llama3</strong> and <strong>Mistral</strong> models in the GGUF format.
2421
<br><br>
25-
Previous intergration of TornadoVM and Llama2 it can be found in <a href="https://github.com/mikepapadim/llama2.tornadovm.java">llama2.tornadovm</a>.
22+
Builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a> by <a href="https://github.com/mukel">Alfonso² Peterssen</a>.
23+
Previous integration of TornadoVM and Llama2 it can be found in <a href="https://github.com/mikepapadim/llama2.tornadovm.java">llama2.tornadovm</a>.
2624
</td>
2725
</tr>
2826
</table>
@@ -161,6 +159,71 @@ python llama-tornado --gpu --verbose-init --opencl --model beehive-llama-3.2-1b
161159
```
162160
-----------
163161

162+
## ☕ Integration with Your Java Codebase or Tools
163+
164+
To integrate it into your codebase or IDE (e.g., IntelliJ) or custom build system (like IntelliJ, Maven, or Gradle), use the `--show-command` flag.
165+
This flag shows the exact Java command with all JVM flags that are being invoked under the hood to enable seamless execution on GPUs with TornadoVM.
166+
Hence, it makes it simple to replicate or embed the invoked flags in any external tool or codebase.
167+
168+
```bash
169+
llama-tornado --gpu --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke" --show-command
170+
```
171+
172+
<details>
173+
<summary>📋 Click to see the JVM configuration </summary>
174+
175+
```java
176+
/home/mikepapadim/.sdkman/candidates/java/current/bin/java \
177+
-server \
178+
-XX:+UnlockExperimentalVMOptions \
179+
-XX:+EnableJVMCI \
180+
-Xms20g -Xmx20g \
181+
--enable-preview \
182+
-Djava.library.path=/home/mikepapadim/manchester/TornadoVM/bin/sdk/lib \
183+
-Djdk.module.showModuleResolution=false \
184+
--module-path .:/home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/tornado \
185+
-Dtornado.load.api.implementation=uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph \
186+
-Dtornado.load.runtime.implementation=uk.ac.manchester.tornado.runtime.TornadoCoreRuntime \
187+
-Dtornado.load.tornado.implementation=uk.ac.manchester.tornado.runtime.common.Tornado \
188+
-Dtornado.load.annotation.implementation=uk.ac.manchester.tornado.annotation.ASMClassVisitor \
189+
-Dtornado.load.annotation.parallel=uk.ac.manchester.tornado.api.annotations.Parallel \
190+
-Duse.tornadovm=true \
191+
-Dtornado.threadInfo=false \
192+
-Dtornado.debug=false \
193+
-Dtornado.fullDebug=false \
194+
-Dtornado.printKernel=false \
195+
-Dtornado.print.bytecodes=false \
196+
-Dtornado.device.memory=7GB \
197+
-Dtornado.profiler=false \
198+
-Dtornado.log.profiler=false \
199+
-Dtornado.profiler.dump.dir=/home/mikepapadim/repos/gpu-llama3.java/prof.json \
200+
-Dtornado.enable.fastMathOptimizations=true \
201+
-Dtornado.enable.mathOptimizations=false \
202+
-Dtornado.enable.nativeFunctions=fast \
203+
-Dtornado.loop.interchange=true \
204+
-Dtornado.eventpool.maxwaitevents=32000 \
205+
"-Dtornado.opencl.compiler.flags=-cl-denorms-are-zero -cl-no-signed-zeros -cl-finite-math-only" \
206+
--upgrade-module-path /home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/graalJars \
207+
@/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/common-exports \
208+
@/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/opencl-exports \
209+
--add-modules ALL-SYSTEM,tornado.runtime,tornado.annotation,tornado.drivers.common,tornado.drivers.opencl \
210+
-cp /home/mikepapadim/repos/gpu-llama3.java/target/gpu-llama3-1.0-SNAPSHOT.jar \
211+
com.example.LlamaApp \
212+
-m beehive-llama-3.2-1b-instruct-fp16.gguf \
213+
--temperature 0.1 \
214+
--top-p 0.95 \
215+
--seed 1746903566 \
216+
--max-tokens 512 \
217+
--stream true \
218+
--echo false \
219+
-p "tell me a joke" \
220+
--instruct
221+
```
222+
223+
</details>
224+
225+
-----------
226+
164227
The above model can we swapped with one of the other models, such as `beehive-llama-3.2-3b-instruct-fp16.gguf` or `beehive-llama-3.2-8b-instruct-fp16.gguf`, depending on your needs.
165228
Check models below.
166229

@@ -382,73 +445,6 @@ View TornadoVM's internal behavior:
382445
./llama-tornado --gpu --model model.gguf --prompt "..." --print-threads --print-bytecodes --print-kernel
383446
```
384447

385-
-----------
386-
387-
## Easy Integration with Your Codebase or Tools
388-
389-
To integrate `llama-tornado` into your codebase or IDE (e.g., IntelliJ) or custom build system (like IntelliJ, Maven, or Gradle), use the `--show-command` flag.
390-
This flag shows the exact Java command with all JVM flags that are being invoked under the hood in order to enable seamless execution on GPUs with TornadoVM.
391-
Hence, it makes it simple to replicate or embed the invoked flags in any external tool or codebase.
392-
393-
```bash
394-
llama-tornado --gpu --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke" --show-command
395-
```
396-
397-
<details>
398-
<summary>📋 Click to see the full Java command</summary>
399-
400-
```java
401-
/home/mikepapadim/.sdkman/candidates/java/current/bin/java \
402-
-server \
403-
-XX:+UnlockExperimentalVMOptions \
404-
-XX:+EnableJVMCI \
405-
-Xms20g -Xmx20g \
406-
--enable-preview \
407-
-Djava.library.path=/home/mikepapadim/manchester/TornadoVM/bin/sdk/lib \
408-
-Djdk.module.showModuleResolution=false \
409-
--module-path .:/home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/tornado \
410-
-Dtornado.load.api.implementation=uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph \
411-
-Dtornado.load.runtime.implementation=uk.ac.manchester.tornado.runtime.TornadoCoreRuntime \
412-
-Dtornado.load.tornado.implementation=uk.ac.manchester.tornado.runtime.common.Tornado \
413-
-Dtornado.load.annotation.implementation=uk.ac.manchester.tornado.annotation.ASMClassVisitor \
414-
-Dtornado.load.annotation.parallel=uk.ac.manchester.tornado.api.annotations.Parallel \
415-
-Duse.tornadovm=true \
416-
-Dtornado.threadInfo=false \
417-
-Dtornado.debug=false \
418-
-Dtornado.fullDebug=false \
419-
-Dtornado.printKernel=false \
420-
-Dtornado.print.bytecodes=false \
421-
-Dtornado.device.memory=7GB \
422-
-Dtornado.profiler=false \
423-
-Dtornado.log.profiler=false \
424-
-Dtornado.profiler.dump.dir=/home/mikepapadim/repos/gpu-llama3.java/prof.json \
425-
-Dtornado.enable.fastMathOptimizations=true \
426-
-Dtornado.enable.mathOptimizations=false \
427-
-Dtornado.enable.nativeFunctions=fast \
428-
-Dtornado.loop.interchange=true \
429-
-Dtornado.eventpool.maxwaitevents=32000 \
430-
"-Dtornado.opencl.compiler.flags=-cl-denorms-are-zero -cl-no-signed-zeros -cl-finite-math-only" \
431-
--upgrade-module-path /home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/graalJars \
432-
@/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/common-exports \
433-
@/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/opencl-exports \
434-
--add-modules ALL-SYSTEM,tornado.runtime,tornado.annotation,tornado.drivers.common,tornado.drivers.opencl \
435-
-cp /home/mikepapadim/repos/gpu-llama3.java/target/gpu-llama3-1.0-SNAPSHOT.jar \
436-
com.example.LlamaApp \
437-
-m beehive-llama-3.2-1b-instruct-fp16.gguf \
438-
--temperature 0.1 \
439-
--top-p 0.95 \
440-
--seed 1746903566 \
441-
--max-tokens 512 \
442-
--stream true \
443-
--echo false \
444-
-p "tell me a joke" \
445-
--instruct
446-
```
447-
448-
</details>
449-
450-
-----------
451-
452448
## Current Features & Roadmap
453449

454450
- **Support for GGUF format models** with full FP16 and partial support for Q8_0 and Q4_0 quantization.

0 commit comments

Comments
 (0)