Skip to content

v0.3.17-cu130-Basic-linux-20251207

Choose a tag to compare

@github-actions github-actions released this 07 Dec 16:54
· 4 commits to main since this release

feat: perf: optimize LlamaModel.metadata reading performance

  • Increase initial buffer size to 16KB to eliminate re-allocations for large chat templates.
  • Cache ctypes function references to reduce loop overhead.
  • Repeated model loading can result in a cumulative speed improvement of 1-3%.

feat: Update Submodule vendor/llama.cpp d9e03db..0a540f9
feat: Sync ggml-zendnn : add ZenDNN backend for AMD CPUs
feat: workflow: Added workflows for compiling with CUDA 13.0.2 on Windows and Linux.
feat: feat: Added the scan path for CUDA 13.0+ dynamic link libraries under Windows system ($env:CUDA_PATH\bin\x64)