v3.0.0-beta.15
Pre-release
Pre-release
3.0.0-beta.15 (2024-04-04)
Bug Fixes
- create a context with no parameters (#188) (6267778)
- improve chat wrappers tokenization (#182) (35e6f50)
- use the new
llama.cpp
CUDA flag (#182) (35e6f50) - adapt to breaking
llama.cpp
changes (#183) (6b012a6)
Features
- automatically adapt to current free VRAM state (#182) (35e6f50)
inspect gguf
command (#182) (35e6f50)inspect measure
command (#182) (35e6f50)readGgufFileInfo
function (#182) (35e6f50)- GGUF file metadata info on
LlamaModel
(#182) (35e6f50) JinjaTemplateChatWrapper
(#182) (35e6f50)- use the
tokenizer.chat_template
header from thegguf
file when available - use it to find a better specialized chat wrapper or useJinjaTemplateChatWrapper
with it as a fallback (#182) (35e6f50) - simplify generation CLI commands:
chat
,complete
,infill
(#182) (35e6f50) - Windows on Arm prebuilt binary (#181) (f3b7f81)
Shipped with llama.cpp
release b2608
To use the latest
llama.cpp
release available, runnpx --no node-llama-cpp download --release latest
. (learn more)