Releases: withcatai/node-llama-cpp
v3.14.0
3.14.0 (2025-10-02)
Features
- Qwen3 Reranker support (#506) (00305f7) (see #506 for prequantized Qwen3 Reranker models you can use)
Bug Fixes
- handle HuggingFace rate limit responses (#506) (00305f7)
- adapt to
llama.cpp
breaking changes (#506) (00305f7)
Shipped with llama.cpp
release b6673
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.13.0
3.13.0 (2025-09-09)
Features
Bug Fixes
- adapt to breaking
llama.cpp
changes (#501) (76b505e) - Vulkan: read external memory usage (#500) (d33cc31)
Shipped with llama.cpp
release b6431
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.12.4
✨ gpt-oss
is here! ✨
Read about the release in the blog post
3.12.4 (2025-08-28)
Bug Fixes
Shipped with llama.cpp
release b6301
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.12.3
✨ gpt-oss
is here! ✨
Read about the release in the blog post
3.12.3 (2025-08-26)
Bug Fixes
- Vulkan: context creation edge cases (#492) (12749c0)
- prebuilt binaries CUDA 13 support (#494) (b10999d)
- don't share loaded shared libraries between backends (#492) (12749c0)
- split prebuilt CUDA binaries into 2 npm modules (#495) (6e59160)
Shipped with llama.cpp
release b6294
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.12.1
✨ gpt-oss
is here! ✨
Read about the release in the blog post
3.12.1 (2025-08-11)
Features
comment
segment budget (#489) (30eaa23) (documentation: API:LLamaChatPromptOptions["budgets"]["commentTokens"]
)- Electron template: comment segments
- Electron template: improve completions speed when using functions
Bug Fixes
gpt-oss
segment budgets (#489) (30eaa23)- add support for more
gpt-oss
variations (#489) (30eaa23) - default to using a model message for prompt completion on unsupported models (#489) (30eaa23)
- prompt completion config (#490) (f849cd9)
Shipped with llama.cpp
release b6133
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.12.0
✨ gpt-oss
is here! ✨
Read about the release in the blog post
3.12.0 (2025-08-09)
Features
Bug Fixes
Shipped with llama.cpp
release b6122
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.11.0
3.11.0 (2025-07-29)
Features
- NUMA policy (#482) (a2ddaa2) (documentation: API:
LlamaOptions["numa"]
) inspect gpu
command: log prebuilt binaries and cloned source releases (#482) (a2ddaa2)
Bug Fixes
- add missing GGUF metadata types (#482) (a2ddaa2)
- level of some internal logs (#482) (a2ddaa2)
- JSON schema grammar edge case (#482) (a2ddaa2)
Shipped with llama.cpp
release b6018
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.10.0
3.10.0 (2025-06-12)
Features
- JSON Schema Grammar:
$defs
and$ref
support with full inferred types (#472) (9cdbce9) inspect gguf
command: format and print the Jinja chat template with--key .chatTemplate
(#472) (9cdbce9)
Bug Fixes
JinjaTemplateChatWrapper
: first function call prefix detection (#472) (9cdbce9)QwenChatWrapper
: improve Qwen chat template detection (#472) (9cdbce9)- apply
maxTokens
on function calling parameters (#472) (9cdbce9) - adjust default prompt completion length based on SWA size when relevant (#472) (9cdbce9)
- improve thought segmentation syntax extraction (#472) (9cdbce9)
- adapt to
llama.cpp
changes (#472) (9cdbce9)
Shipped with llama.cpp
release b5640
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.9.0
3.9.0 (2025-06-04)
Features
- reasoning budget (#468) (ea8d904) (documentation: Set Reasoning Budget)
- SWA (Sliding Window Attention) support - greatly reduced context memory consumption on supported models (#468) (ea8d904)
- documentation: LLMs friendly
llms.md
andllms-full.md
files (#468) (ea8d904)
Bug Fixes
Shipped with llama.cpp
release b5590
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.8.1
3.8.1 (2025-05-19)
Bug Fixes
getLlamaGpuTypes
: edge case (#463) (1799127)- remove prompt completion from the cached context window (#463) (1799127)
Shipped with llama.cpp
release b5415
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)