Skip to content

Releases: beehive-lab/GPULlama3.java

v0.2.1

15 Sep 16:08
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1

v0.2.0

04 Sep 12:42
Compare
Choose a tag to compare

Model Support

  • Mistral – support for GGUF-format Mistral models with optimized GPU execution.
  • Qwen2.5 – GGUF-format Qwen2.5 models supported, including performance improvements for attention layers.
  • Qwen3 – compatible with GGUF-format Qwen3 models and updated integration.
  • DeepSeek-R1-Distill-Qwen-1.5B – GGUF-format DeepSeek distilled models supported for efficient inference.
  • Phi-3 – full support for GGUF-format Microsoft Phi-3 models for high-performance workloads.

What's Changed

New Contributors

Full Changelog: v0.1.0-beta...v0.2.0

v0.1.0-beta

30 May 07:01
0c9a05a
Compare
Choose a tag to compare
  • Llama 3 model compatibility - Full support for Llama 3.0, 3.1, and 3.2 models
  • GGUF format support - Native handling of GGUF model files
  • Support for FP16 models for reduced memory usage and faster computation
  • GPU Acceleration on NVIDIA GPUs using both OpenCL and PTX backends
  • [Experimental] Support for Apple Silicon (M1/M2/M3) via OpenCL (subject to hardware/compiler limitations)
  • [Experimental] Initial support for Q8 and Q4 quantized models, using runtime dequantization to FP16