|
15 | 15 | </td>
|
16 | 16 | <td style="vertical-align: middle; padding-left: 20px; border: none;">
|
17 | 17 | <strong>Llama3</strong> models written in <strong>native Java</strong> automatically accelerated on GPUs with <a href="https://github.com/beehive-lab/TornadoVM" target="_blank"><strong>TornadoVM</strong></a>.
|
18 |
| -This project allows you to run Llama3 inference efficiently, leveraging TornadoVM's parallel computing features for enhanced performance. |
19 |
| - |
20 |
| -<br><br> |
21 |
| -Builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a>, based on the original <a href="https://github.com/meta-llama/llama3">Llama 3</a>, <a href="https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1">3.1</a>, and <a href="https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/">3.2</a> models, with TornadoVM support for parallelism and hardware acceleration. |
| 18 | +Runs Llama3 inference efficiently using TornadoVM's GPU acceleration. |
22 | 19 | <br><br>
|
23 |
| -Thanks to <a href="https://github.com/mukel">Alfonso² Peterssen</a> for the original implementation of Llama3.java. |
| 20 | +Currently, supports <strong>Llama3</strong> and <strong>Mistral</strong> models in the GGUF format. |
24 | 21 | <br><br>
|
25 |
| -Previous intergration of TornadoVM and Llama2 it can be found in <a href="https://github.com/mikepapadim/llama2.tornadovm.java">llama2.tornadovm</a>. |
| 22 | +Builds on <a href="https://github.com/mukel/llama3.java">Llama3.java</a> by <a href="https://github.com/mukel">Alfonso² Peterssen</a>. |
| 23 | +Previous integration of TornadoVM and Llama2 it can be found in <a href="https://github.com/mikepapadim/llama2.tornadovm.java">llama2.tornadovm</a>. |
26 | 24 | </td>
|
27 | 25 | </tr>
|
28 | 26 | </table>
|
@@ -161,6 +159,71 @@ python llama-tornado --gpu --verbose-init --opencl --model beehive-llama-3.2-1b
|
161 | 159 | ```
|
162 | 160 | -----------
|
163 | 161 |
|
| 162 | +## ☕ Integration with Your Java Codebase or Tools |
| 163 | + |
| 164 | +To integrate it into your codebase or IDE (e.g., IntelliJ) or custom build system (like IntelliJ, Maven, or Gradle), use the `--show-command` flag. |
| 165 | +This flag shows the exact Java command with all JVM flags that are being invoked under the hood to enable seamless execution on GPUs with TornadoVM. |
| 166 | +Hence, it makes it simple to replicate or embed the invoked flags in any external tool or codebase. |
| 167 | + |
| 168 | +```bash |
| 169 | +llama-tornado --gpu --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke" --show-command |
| 170 | +``` |
| 171 | + |
| 172 | +<details> |
| 173 | +<summary>📋 Click to see the JVM configuration </summary> |
| 174 | + |
| 175 | +```java |
| 176 | +/home/mikepapadim/.sdkman/candidates/java/current/bin/java \ |
| 177 | + -server \ |
| 178 | + -XX:+UnlockExperimentalVMOptions \ |
| 179 | + -XX:+EnableJVMCI \ |
| 180 | + -Xms20g -Xmx20g \ |
| 181 | + --enable-preview \ |
| 182 | + -Djava.library.path=/home/mikepapadim/manchester/TornadoVM/bin/sdk/lib \ |
| 183 | + -Djdk.module.showModuleResolution=false \ |
| 184 | + --module-path .:/home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/tornado \ |
| 185 | + -Dtornado.load.api.implementation=uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph \ |
| 186 | + -Dtornado.load.runtime.implementation=uk.ac.manchester.tornado.runtime.TornadoCoreRuntime \ |
| 187 | + -Dtornado.load.tornado.implementation=uk.ac.manchester.tornado.runtime.common.Tornado \ |
| 188 | + -Dtornado.load.annotation.implementation=uk.ac.manchester.tornado.annotation.ASMClassVisitor \ |
| 189 | + -Dtornado.load.annotation.parallel=uk.ac.manchester.tornado.api.annotations.Parallel \ |
| 190 | + -Duse.tornadovm=true \ |
| 191 | + -Dtornado.threadInfo=false \ |
| 192 | + -Dtornado.debug=false \ |
| 193 | + -Dtornado.fullDebug=false \ |
| 194 | + -Dtornado.printKernel=false \ |
| 195 | + -Dtornado.print.bytecodes=false \ |
| 196 | + -Dtornado.device.memory=7GB \ |
| 197 | + -Dtornado.profiler=false \ |
| 198 | + -Dtornado.log.profiler=false \ |
| 199 | + -Dtornado.profiler.dump.dir=/home/mikepapadim/repos/gpu-llama3.java/prof.json \ |
| 200 | + -Dtornado.enable.fastMathOptimizations=true \ |
| 201 | + -Dtornado.enable.mathOptimizations=false \ |
| 202 | + -Dtornado.enable.nativeFunctions=fast \ |
| 203 | + -Dtornado.loop.interchange=true \ |
| 204 | + -Dtornado.eventpool.maxwaitevents=32000 \ |
| 205 | + "-Dtornado.opencl.compiler.flags=-cl-denorms-are-zero -cl-no-signed-zeros -cl-finite-math-only" \ |
| 206 | + --upgrade-module-path /home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/graalJars \ |
| 207 | + @/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/common-exports \ |
| 208 | + @/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/opencl-exports \ |
| 209 | + --add-modules ALL-SYSTEM,tornado.runtime,tornado.annotation,tornado.drivers.common,tornado.drivers.opencl \ |
| 210 | + -cp /home/mikepapadim/repos/gpu-llama3.java/target/gpu-llama3-1.0-SNAPSHOT.jar \ |
| 211 | + com.example.LlamaApp \ |
| 212 | + -m beehive-llama-3.2-1b-instruct-fp16.gguf \ |
| 213 | + --temperature 0.1 \ |
| 214 | + --top-p 0.95 \ |
| 215 | + --seed 1746903566 \ |
| 216 | + --max-tokens 512 \ |
| 217 | + --stream true \ |
| 218 | + --echo false \ |
| 219 | + -p "tell me a joke" \ |
| 220 | + --instruct |
| 221 | +``` |
| 222 | + |
| 223 | +</details> |
| 224 | + |
| 225 | +----------- |
| 226 | + |
164 | 227 | The above model can we swapped with one of the other models, such as `beehive-llama-3.2-3b-instruct-fp16.gguf` or `beehive-llama-3.2-8b-instruct-fp16.gguf`, depending on your needs.
|
165 | 228 | Check models below.
|
166 | 229 |
|
@@ -382,73 +445,6 @@ View TornadoVM's internal behavior:
|
382 | 445 | ./llama-tornado --gpu --model model.gguf --prompt "..." --print-threads --print-bytecodes --print-kernel
|
383 | 446 | ```
|
384 | 447 |
|
385 |
| ------------ |
386 |
| - |
387 |
| -## Easy Integration with Your Codebase or Tools |
388 |
| - |
389 |
| -To integrate `llama-tornado` into your codebase or IDE (e.g., IntelliJ) or custom build system (like IntelliJ, Maven, or Gradle), use the `--show-command` flag. |
390 |
| -This flag shows the exact Java command with all JVM flags that are being invoked under the hood in order to enable seamless execution on GPUs with TornadoVM. |
391 |
| -Hence, it makes it simple to replicate or embed the invoked flags in any external tool or codebase. |
392 |
| - |
393 |
| -```bash |
394 |
| -llama-tornado --gpu --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke" --show-command |
395 |
| -``` |
396 |
| - |
397 |
| -<details> |
398 |
| -<summary>📋 Click to see the full Java command</summary> |
399 |
| - |
400 |
| -```java |
401 |
| -/home/mikepapadim/.sdkman/candidates/java/current/bin/java \ |
402 |
| - -server \ |
403 |
| - -XX:+UnlockExperimentalVMOptions \ |
404 |
| - -XX:+EnableJVMCI \ |
405 |
| - -Xms20g -Xmx20g \ |
406 |
| - --enable-preview \ |
407 |
| - -Djava.library.path=/home/mikepapadim/manchester/TornadoVM/bin/sdk/lib \ |
408 |
| - -Djdk.module.showModuleResolution=false \ |
409 |
| - --module-path .:/home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/tornado \ |
410 |
| - -Dtornado.load.api.implementation=uk.ac.manchester.tornado.runtime.tasks.TornadoTaskGraph \ |
411 |
| - -Dtornado.load.runtime.implementation=uk.ac.manchester.tornado.runtime.TornadoCoreRuntime \ |
412 |
| - -Dtornado.load.tornado.implementation=uk.ac.manchester.tornado.runtime.common.Tornado \ |
413 |
| - -Dtornado.load.annotation.implementation=uk.ac.manchester.tornado.annotation.ASMClassVisitor \ |
414 |
| - -Dtornado.load.annotation.parallel=uk.ac.manchester.tornado.api.annotations.Parallel \ |
415 |
| - -Duse.tornadovm=true \ |
416 |
| - -Dtornado.threadInfo=false \ |
417 |
| - -Dtornado.debug=false \ |
418 |
| - -Dtornado.fullDebug=false \ |
419 |
| - -Dtornado.printKernel=false \ |
420 |
| - -Dtornado.print.bytecodes=false \ |
421 |
| - -Dtornado.device.memory=7GB \ |
422 |
| - -Dtornado.profiler=false \ |
423 |
| - -Dtornado.log.profiler=false \ |
424 |
| - -Dtornado.profiler.dump.dir=/home/mikepapadim/repos/gpu-llama3.java/prof.json \ |
425 |
| - -Dtornado.enable.fastMathOptimizations=true \ |
426 |
| - -Dtornado.enable.mathOptimizations=false \ |
427 |
| - -Dtornado.enable.nativeFunctions=fast \ |
428 |
| - -Dtornado.loop.interchange=true \ |
429 |
| - -Dtornado.eventpool.maxwaitevents=32000 \ |
430 |
| - "-Dtornado.opencl.compiler.flags=-cl-denorms-are-zero -cl-no-signed-zeros -cl-finite-math-only" \ |
431 |
| - --upgrade-module-path /home/mikepapadim/manchester/TornadoVM/bin/sdk/share/java/graalJars \ |
432 |
| - @/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/common-exports \ |
433 |
| - @/home/mikepapadim/manchester/TornadoVM/bin/sdk/etc/exportLists/opencl-exports \ |
434 |
| - --add-modules ALL-SYSTEM,tornado.runtime,tornado.annotation,tornado.drivers.common,tornado.drivers.opencl \ |
435 |
| - -cp /home/mikepapadim/repos/gpu-llama3.java/target/gpu-llama3-1.0-SNAPSHOT.jar \ |
436 |
| - com.example.LlamaApp \ |
437 |
| - -m beehive-llama-3.2-1b-instruct-fp16.gguf \ |
438 |
| - --temperature 0.1 \ |
439 |
| - --top-p 0.95 \ |
440 |
| - --seed 1746903566 \ |
441 |
| - --max-tokens 512 \ |
442 |
| - --stream true \ |
443 |
| - --echo false \ |
444 |
| - -p "tell me a joke" \ |
445 |
| - --instruct |
446 |
| -``` |
447 |
| - |
448 |
| -</details> |
449 |
| - |
450 |
| ------------ |
451 |
| - |
452 | 448 | ## Current Features & Roadmap
|
453 | 449 |
|
454 | 450 | - **Support for GGUF format models** with full FP16 and partial support for Q8_0 and Q4_0 quantization.
|
|
0 commit comments