Releases: withcatai/node-llama-cpp
v3.3.2
v3.3.1
v3.3.0
3.3.0 (2024-12-02)
Bug Fixes
- improve binary compatibility testing on Electron apps (#386) (97abbca)
- too many abort signal listeners (#386) (97abbca)
- log level of some lower level logs (#386) (97abbca)
- context window missing response during generation on specific extreme conditions (#386) (97abbca)
- adapt to breaking
llama.cppchanges (#386) (97abbca) - automatically resolve
compiler is out of heap spaceCUDA build error (#386) (97abbca)
Features
- Llama 3.2 3B function calling support (#386) (97abbca)
- use
llama.cppbackend registry for GPUs instead of custom implementations (#386) (97abbca) getLlama:build: "try"option (#386) (97abbca)initcommand:--modelflag (#386) (97abbca)- JSON Schema grammar: array
prefixItems,minItems,maxItemssupport (#388) (4d387de) - JSON Schema grammar: object
additionalProperties,minProperties,maxPropertiessupport (#388) (4d387de) - JSON Schema grammar: string
minLength,maxLength,formatsupport (#388) (4d387de) - JSON Schema grammar: improve inferred types (#388) (4d387de)
- function calling: params
descriptionsupport (#388) (4d387de) - function calling: document JSON Schema type properties on Functionary chat function types (#388) (4d387de)
Shipped with llama.cpp release b4234
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.2.0
3.2.0 (2024-10-31)
Bug Fixes
- Electron crash with some models on macOS when not using Metal (#375) (ea12dc5)
- adapt to
llama.cppbreaking changes (#375) (ea12dc5) - support
rejectattrin Jinja templates (#376) (ea12dc5) - build warning on macOS (#377) (6405ee9)
Features
- chat session response prefix (#375) (ea12dc5)
- improve context shift strategy (#375) (ea12dc5)
- use RAM and swap sizes in memory usage estimations (#375) (ea12dc5)
- faster building from source (#375) (ea12dc5)
- improve CPU compatibility score (#375) (ea12dc5)
inspect ggufcommand: print a single key flag (#375) (ea12dc5)
Shipped with llama.cpp release b3995
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.1.1
v3.1.0
3.1.0 (2024-10-05)
Bug Fixes
Features
Shipped with llama.cpp release b3887
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.0.3
✨ node-llama-cpp 3.0 is here! ✨
Read about the release in the blog post
3.0.3 (2024-09-25)
Bug Fixes
Shipped with llama.cpp release b3825
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.0.2
✨ node-llama-cpp 3.0 is here! ✨
Read about the release in the blog post
3.0.2 (2024-09-25)
Bug Fixes
Shipped with llama.cpp release b3821
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.0.1
✨ node-llama-cpp 3.0 is here! ✨
Read about the release in the blog post
3.0.1 (2024-09-24)
Bug Fixes
Shipped with llama.cpp release b3808
To use the latest
llama.cpprelease available, runnpx -n node-llama-cpp source download --release latest. (learn more)
v3.0.0
✨ node-llama-cpp 3.0 is here! ✨
Read about the release in the blog post
3.0.0 (2024-09-24)
Features
- function calling (#139) (5fcdf9b)
- get embedding for text (#144) (4cf1fba)
- async model and context loading (#178) (315a3eb)
- token biases (#196) (3ad4494)
- automatic batching (#104) (4757af8)
- prompt completion engine (#225) (95f4645)
- model compatibility warnings (#225) (95f4645)
- Vulkan support (#171) (d161bcd)
- Windows on Arm prebuilt binary (#181) (f3b7f81)
- change the default log level to warn (#191) (b542b53)
pullcommand (#214) (453c162)inspect gpucommand (#175) (5a70576)inspect ggufcommand (#182) (35e6f50)inspect estimatecommand (#309) (4b3ad61)inspect measurecommand (#182) (35e6f50)initcommand to scaffold a new project from a template (withnode-typescriptandelectron-typescript-reacttemplates) (#217) (d6a0f43)- move
download,buildandclearcommands to be subcommands of asourcecommand (#309) (4b3ad61) - move
seedoption to the prompt level (#309) (4b3ad61) TemplateChatWrapper: custom history template for each message role (#309) (4b3ad61)- Llama 3.1 support (#273) (e3e0994)
- Mistral chat wrapper (#309) (4b3ad61)
- Functionary v3 support (#309) (4b3ad61)
- Phi-3 support (#273) (e3e0994)
- extract all prebuilt binaries to external modules (#309) (4b3ad61)
- parallel function calling (#225) (95f4645)
- preload prompt (#225) (95f4645)
onTextChunkoption (#273) (e3e0994)- flash attention (#264) (c2e322c)
- debug mode (#217) (d6a0f43)
- load LoRA adapters (#217) (d6a0f43)
- split gguf files support (#214) (453c162)
stopOnAbortSignalandcustomStopTriggersonLlamaChatandLlamaChatSession(#214) (453c162)- Llama 3 support (#205) (ef501f9)
--gpuflag in generation CLI commands (#205) (ef501f9)specialTokensparameter onmodel.detokenize(#205) (ef501f9)- interactively select a model from CLI commands (#191) (b542b53)
- automatically adapt to current free VRAM state (#182) (35e6f50)
- GGUF file metadata info on
LlamaModel(#182) (35e6f50) - use the
tokenizer.chat_templateheader from thegguffile when available - use it to find a better specialized chat wrapper or useJinjaTemplateChatWrapperwith it as a fallback (#182) (35e6f50) - simplify generation CLI commands:
chat,complete,infill(#182) (35e6f50) - gguf parser (#168) (bcaab4f)
- use the best compute layer available by default (#175) (5a70576)
- more guardrails to prevent loading an incompatible prebuilt binary (#175) (5a70576)
- completion and infill (#164) (ede69c1)
- support configuring more options for
getLlamawhen using"lastBuild"(#164) (ede69c1) - get VRAM state (#161) ([46235a2](https://github.com/withc...

