|
| 1 | +--- |
| 2 | +title: node-llama-cpp v3.0 |
| 3 | +date: 2024-09-23T22:00:00Z |
| 4 | +author: |
| 5 | + name: Gilad S. |
| 6 | + github: giladgd |
| 7 | +category: Release |
| 8 | +description: Learn more about the new features in node-llama-cpp v3.0! |
| 9 | +image: |
| 10 | + url: https://github.com/user-attachments/assets/c7ed2eab-fb50-426d-9019-aed40147f30e |
| 11 | + alt: Celebrate |
| 12 | + width: 3072 |
| 13 | + height: 1536 |
| 14 | +--- |
| 15 | +[`node-llama-cpp`](https://node-llama-cpp.withcat.ai) 3.0 is finally here. |
| 16 | + |
| 17 | +With [`node-llama-cpp`](https://node-llama-cpp.withcat.ai), you can run large language models locally on your machine using the power of [`llama.cpp`](https://github.com/ggerganov/llama.cpp) with a simple and easy-to-use API. |
| 18 | + |
| 19 | +It includes everything you need, from downloading models, to running them in the most optimized way for your hardware, and integrating them in your projects. |
| 20 | + |
| 21 | +--- |
| 22 | + |
| 23 | +## Why `node-llama-cpp`? |
| 24 | +You might be wondering, why choose `node-llama-cpp` over using an OpenAI API of a service running on your machine? |
| 25 | + |
| 26 | +The answer is simple: simplicity, performance, and flexibility. |
| 27 | + |
| 28 | +Let's break it down: |
| 29 | + |
| 30 | +### Simplicity |
| 31 | +To use `node-llama-cpp`, you install it like any other npm package, and you're good to go. |
| 32 | + |
| 33 | +To run your project, all you have to do is `npm install` and `npm start`. That's it. |
| 34 | + |
| 35 | +No installing additional software on your machine, no setting up API keys or environment variables, no setup process at all. |
| 36 | +Everything is self-contained in your project, giving you complete control over it. |
| 37 | + |
| 38 | +With `node-llama-cpp`, you can run large language models on your machine using Node.js and TypeScript, _without_ any Python at all. |
| 39 | +Say goodbye to setup headaches, "it works on my machine" issues, and all other Python-related problems. |
| 40 | + |
| 41 | +While `llama.cpp` is an amazing project, it's also highly technical and can be challenging for beginners. |
| 42 | +`node-llama-cpp` bridge that gap, making `llama.cpp` accessible to everyone, regardless of their experience level. |
| 43 | + |
| 44 | +### Performance |
| 45 | +[`node-llama-cpp`](https://node-llama-cpp.withcat.ai) is built on top of [`llama.cpp`](https://github.com/ggerganov/llama.cpp), a highly optimized C++ library for running large language models. |
| 46 | + |
| 47 | +`llama.cpp` supports many compute backends, including Metal, CUDA, and Vulkan. It also uses [Accelerate](https://developer.apple.com/accelerate/) on Mac. |
| 48 | + |
| 49 | +`node-llama-cpp` automatically adapts to your hardware and adjusts the default settings to give you the best performance, |
| 50 | +so you don't _have_ to configure anything to use it. |
| 51 | + |
| 52 | +By using `node-llama-cpp` you are essentially running models _inside_ your project. |
| 53 | +With no overhead of network calls or data serializations, |
| 54 | +you can more effectively take advantage of the stateful nature of inference operations. |
| 55 | + |
| 56 | +For example, you can prompt a model on top of an existing conversation inference state, |
| 57 | +without re-evaluating the entire history just to process the new prompt. |
| 58 | +<br/> |
| 59 | +This reduces the time it takes to start generating a response, and makes more efficient use of your resources. |
| 60 | + |
| 61 | +If you were using an API, you would have to re-evaluate the entire history every time you prompt the model, |
| 62 | +or have the API store the state for you, which can use huge amounts of disk space. |
| 63 | + |
| 64 | +### Flexibility |
| 65 | +Since `node-llama-cpp` runs inside your project, you can also deploy it together with your project. |
| 66 | +<br/> |
| 67 | +You can run models in your [Electron](../guide/electron.md) app without requiring any additional setup on the user's machine. |
| 68 | + |
| 69 | +You can build libraries that use large language models and distribute them as npm packages, |
| 70 | +<br/> |
| 71 | +or deploy self-contained Docker images and run them on any hardware you want. |
| 72 | + |
| 73 | +You can use [any model you want](../guide/choosing-a-model.md), or even create your own and use it with `node-llama-cpp`. |
| 74 | +<br/> |
| 75 | +Download models [as part of `npm install`](../guide/downloading-models.md) or [on-demand from your code](../guide/downloading-models.md#programmatic). |
| 76 | + |
| 77 | +[Tweak inference settings](../guide/chat-session.md#repeat-penalty) to get better results for your particular use case. |
| 78 | + |
| 79 | +`node-llama-cpp` is regularly updated with the latest `llama.cpp` release, |
| 80 | +but you can also [download and build the latest release](../guide/building-from-source.md#download-new-release) at any time with a single command. |
| 81 | + |
| 82 | +The possibilities are endless. |
| 83 | +You have full control over the models you use, how you use them, and where you use them. |
| 84 | +You can tailor `node-llama-cpp` to your needs in ways that aren't possible with an OpenAI API (at least not efficiently or easily). |
| 85 | + |
| 86 | +## Powerful Features |
| 87 | +`node-llama-cpp` includes a complete suite of everything you need to use large language models in your projects, |
| 88 | +with convenient wrappers for popular tasks, such as: |
| 89 | +* [Enforcing a JSON schema](../guide/chat-session.md#response-json-schema) on the output the model generates |
| 90 | +* Providing the model with [functions it can call on demand](../guide/chat-session.md#function-calling) to retrieve information or perform actions, even with some models that don't officially support it |
| 91 | +* [Generating completion](../guide/text-completion.md) for a given text |
| 92 | +* [Embedding text](../guide/embedding.md) for similarity searches or other tasks |
| 93 | +* Much more |
| 94 | + |
| 95 | +## Why Node.js? |
| 96 | +JavaScript is the most popular programming language in the world, and Node.js is the most popular runtime for JavaScript server-side applications. |
| 97 | +Developers choose Node.js for its versatility, reliability, ease of use, forward compatibility, and the vast ecosystem of npm packages. |
| 98 | + |
| 99 | +While Python is currently the go-to language for data science and machine learning, |
| 100 | +the needs of data scientists differ from those of developers building services and applications. |
| 101 | + |
| 102 | +`node-llama-cpp` bridges this gap, making it easier to integrate large language models into Node.js and Electron projects, |
| 103 | +while focusing on the needs of developers building services and applications. |
| 104 | + |
| 105 | +## Try It Out |
| 106 | +`node-llama-cpp` comes with comprehensive documentation, covering everything from installation to advanced usage. |
| 107 | +It's beginner-friendly, with explanations for every step of the way for those who are new to the world of large language models, |
| 108 | +while still being flexible enough to allow advanced usage for those who are more experienced and knowledgeable. |
| 109 | + |
| 110 | +Experience the ease of running models on your machine with this single command: |
| 111 | +```shell |
| 112 | +npx -y node-llama-cpp chat |
| 113 | +``` |
| 114 | + |
| 115 | +Check out the [getting started guide](../guide/index.md) to learn how to use `node-llama-cpp`. |
| 116 | + |
| 117 | +## Thank You |
| 118 | +`node-llama-cpp` is only possible thanks to the amazing work done on [`llama.cpp`](https://github.com/ggerganov/llama.cpp) by [Georgi Gerganov](https://github.com/ggerganov), [Slaren](https://github.com/slaren) and all the contributors from the community. |
| 119 | + |
| 120 | +## What's next? |
| 121 | +Version 3.0 is a major milestone, but there's plenty more planned for the future. |
| 122 | + |
| 123 | +Check out the [roadmap](https://github.com/orgs/withcatai/projects/1) to see what's coming next, |
| 124 | +<br /> |
| 125 | +and [give `node-llama-cpp` a star on GitHub](https://github.com/withcatai/node-llama-cpp) to support the project. |
0 commit comments