Releases: withcatai/node-llama-cpp
v3.8.0
3.8.0 (2025-05-17)
Features
- save and restore a context sequence state (#460) (f2cb873) (documentation: Saving and restoring a context sequence evaluation state)
- stream function call parameters (#460) (f2cb873) (documentation: API:
LLamaChatPromptOptions["onFunctionCallParamsChunk"]
) - configure Hugging Face remote endpoint for resolving URIs (#460) (f2cb873) (documentation: API:
ResolveModelFileOptions["endpoints"]
) - Qwen 3 support (#460) (f2cb873)
QwenChatWrapper
: support discouraging the generation of thoughts (#460) (f2cb873) (documentation: API:QwenChatWrapper
constructor >thoughts
option)getLlama
:dryRun
option (#460) (f2cb873) (documentation: API:LlamaOptions["dryRun"]
)getLlamaGpuTypes
function (#460) (f2cb873) (documentation: API:getLlamaGpuTypes
)
Bug Fixes
- adapt to breaking
llama.cpp
changes (#460) (f2cb873) - capture multi-token segment separators (#460) (f2cb873)
- race condition when reading extremely long gguf metadata (#460) (f2cb873)
- adapt memory estimation to newly added model architectures (#460) (f2cb873)
- skip binary testing on certain problematic conditions (#460) (f2cb873)
- improve GPU backend loading error description (#460) (f2cb873)
Shipped with llama.cpp
release b5414
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.7.0
3.7.0 (2025-03-28)
Features
- extract function calling syntax from a Jinja template (#444) (c070e81)
- Full support for Qwen and QwQ via
QwenChatWrapper
(#444) (c070e81) - export a
llama
instance getter on a model instance (#444) (c070e81)
Bug Fixes
- better handling for function calling with empty parameters (#444) (c070e81)
- reranking edge case crash (#444) (c070e81)
- limit the context size by default in the node-typescript template (#444) (c070e81)
- adapt to breaking
llama.cpp
changes (#444) (c070e81) - bump min nodejs version to 20 due to dependencies' requirements (#444) (c070e81)
defineChatSessionFunction
type (#444) (c070e81)
Shipped with llama.cpp
release b4980
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.6.0
✨ DeepSeek R1 is here! ✨
Read about the release in the blog post
3.6.0 (2025-02-21)
Features
- DeepSeek R1 support (#428) (ca6b901) (documentation: DeepSeek R1)
- chain of thought segmentation (#428) (ca6b901) (documentation: Stream Response Segments)
- pass a model to
resolveChatWrapper
(#428) (ca6b901) defineChatSessionFunction
: improveparams
type (#428) (ca6b901)- Electron template: show chain of thought (#428) (ca6b901) (documentation: DeepSeek R1)
- Electron template: add functions template (#428) (ca6b901)
- Electron template: new icon for the CI build (#428) (ca6b901)
- Electron template: update model message in a more stable manner (#428) (ca6b901)
- Electron template: more convenient completion (#428) (ca6b901)
Bug Fixes
- check path existence before reading its content (#428) (ca6b901)
- partial tokens handling (#428) (ca6b901)
- uncaught exception (#430) (599a161)
- Electron template: non-latin text formatting (#430) (599a161)
Shipped with llama.cpp
release b4753
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.5.0
3.5.0 (2025-01-31)
Features
- shorter model URIs (#421) (73454d9) (documentation: Model URIs)
Bug Fixes
Shipped with llama.cpp
release b4600
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.4.3
v3.4.2
3.4.2 (2025-01-27)
Bug Fixes
- metadata string encoding (#420) (314d7e8)
- Vulkan parallel decoding (#420) (314d7e8)
- try auth token on 401 response (#420) (314d7e8)
Shipped with llama.cpp
release b4567
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)
v3.4.1
v3.4.0
3.4.0 (2025-01-08)
Features
- token prediction (speculative decoding) (#405) (632a7bf) (documentation: Token Prediction)
controlledEvaluate
(#405) (632a7bf) (documentation: Low Level API)evaluateWithMetadata
(#405) (632a7bf) (documentation: Low Level API)- reranking (#405) (632a7bf) (documentation: Reranking Documents)
- token confidence (#405) (632a7bf) (documentation: Low Level API)
experimentalChunkDocument
(#405) (632a7bf)- build on arm64 using LLVM (#405) (632a7bf)
- try compiling with LLVM on Windows x64 when available (#405) (632a7bf)
- minor: dynamically load
llama.cpp
backends (#405) (632a7bf) - minor: more token values support in
SpecialToken
(#405) (632a7bf) - minor: improve memory usage estimation (#405) (632a7bf)
Bug Fixes
- check for Rosetta usage on macOS x64 when using the
inspect gpu
command (#405) (632a7bf) - detect running under Rosetta on Apple Silicone and show an error message instead of crashing (#405) (632a7bf)
- switch from
"nextTick"
to"nextCycle"
for the default batch dispatcher (#405) (632a7bf) - remove deprecated CLS token (#405) (632a7bf)
- pipe error logs in
inspect gpu
command (#405) (632a7bf)
Shipped with llama.cpp
release b4435
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)