[FEEDBACK] Better packaging for llama.cpp to support downstream consumers 🤗 #15313
Replies: 23 comments 24 replies
-
From this, IMO it only misses Linux+CUDA bundle to be useable as download & use. If we want better packaging on Linux, we can also work on snap/bash installer when trying to use pre-built packages. |
Beta Was this translation helpful? Give feedback.
-
It’s high time HuggingFace to copy Ollama’s packaging and GTM strategy, but this time, give credit to llama.cpp. Ideally, we should retain llama.cpp as the core component. |
Beta Was this translation helpful? Give feedback.
-
Is the barrier the installation process, or the need to use a complex command line to launch llama.cpp? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
For me the biggest thing is I'd love to see more emphasis placed on My ideal would be for the Maybe include systray integration and a simple UI for selecting and downloading models too. At that point |
Beta Was this translation helpful? Give feedback.
-
It will be cool if 'llama-server' would have auto configuration option to the machine/model like 'ollama' does it. |
Beta Was this translation helpful? Give feedback.
-
For windows maybe choco and windows store would be a good idea? 🤔 |
Beta Was this translation helpful? Give feedback.
-
I created a rpm spec to manage installation though I think flatpaks might be more user friendly and distribution agnostic. |
Beta Was this translation helpful? Give feedback.
-
The released Windows builds are available via Scoop. Updates happen automatically. Old installed versions are kept, and current one symlinked into a folder „current“ which provides the executables on the path. |
Beta Was this translation helpful? Give feedback.
-
is it feasible to have a single release for OS including all the backend? |
Beta Was this translation helpful? Give feedback.
-
For linux I just install the vulkan binaries and run the server from there. Maybe we can have a install script like ollama that detects the system and launches the server which can be controlled from an app as well as cli? The user then gets basic command line utillities like run start stop load list etc? |
Beta Was this translation helpful? Give feedback.
-
On Mac, the easiest way (also arguably the safest way) from a user's perspective is to find it in App Store, and install from there. Because of apps from App Store are in a sandbox, so from a user's point of view, installing or uninstalling is simple and clean. Creating a build and passing the App Store review might take some efforts (due to the sandbox constraint), but it should be a one-time thing. |
Beta Was this translation helpful? Give feedback.
-
Its my understanding that none of the automated installs support GPU acceleration. I might be wrong but its definitely the case for Windows, which makes it useless to install via winget. |
Beta Was this translation helpful? Give feedback.
-
To me the biggest advantage ollama currently has is that the optimal settings for a model are bundled, the gguf spec would allow for this to since its versatile enough to make this a metadata field inside the model. It would allow people to load the settings from a gguf and frontends can extract them and adapt them as they see fit. I think that part is going to be more valuable than obtaining the binary since downloading the binary from github is not that hard. |
Beta Was this translation helpful? Give feedback.
-
My personal wishlist
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Problem there is that our launcher is completely incompatible with llamacpp's project. The launcher UI is python based and for llamacpp you'd want something c++ based. |
Beta Was this translation helpful? Give feedback.
-
Today was announced PyTorch 2.8 and the possibility of just using a single command (uv pip install torch) to install PyTorch with optimal compatibility with your hardware and OS (Windows and Linux): https://astral.sh/blog/wheel-variants It is the result of a joint open source project called WheelNext: In particular for Nvidia hardware uses this to detect best compatibility: https://github.com/wheelnext/nvidia-variant-provider This might be useful to use or to partially use just the detection of the user hardware. |
Beta Was this translation helpful? Give feedback.
-
I believe a package for llama-vulkan would be the killer app.. |
Beta Was this translation helpful? Give feedback.
-
brew install is great but we also need a single curl script like in https://bun.sh/ and https://docs.astral.sh/uv/getting-started/installation/ Also https://x.com/zereraz/status/1956081559695745528 the DX here is not the best.
to
Also even though the logs are cool, verbose mode can be separate. Better TUI inspired by https://github.com/vadimdemedes/ink or https://github.com/sst/opentui |
Beta Was this translation helpful? Give feedback.
-
Docker works great, is simpler than this.
|
Beta Was this translation helpful? Give feedback.
-
TL;DR Within the next few weeks, you'll be able to just Longer Version: With regards to installtion usability, within the next few weeks, llama.cpp will be made available in Debian via the official There are already packages in the In the releases we prepared, we ship the CPU, BLAS, HIP, CUDA and Vulkan backends. More backends will be added on request. However this is also a matter of test infrastructure, as I do test all of the backends on actual hardware before a release. I'm currently setting up a CI to automate this. The CPU backend is built with I would consider it a bug of the Debian package if its performance is not on par with upstream, hence the CI I'm setting up. This is just some initial info, I'm going to submit more information once I've got the CI finished. Most importantly, I want upstream here to see this as a benefit, not a burden, so I need to work out some user documentation and also see if upstream can benefit from our CI in some way. Edit: Forgot to say, upstream has been very accommodating in accepting changes that make shipping universally usable packages easier for us downstreams. Also @mbaudier's input on and testing of the Debian packages mentioned above were really helpful in finalizing this. |
Beta Was this translation helpful? Give feedback.
-
I was about to open a similar issue and ask when CUDA Linux builds, like Another thing is to address the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
llama.cpp as a project has made LLMs accessible to countless developers and consumers including me. The project has also consistently become faster over time as has the coverage beyond LLMs to VLMs, AudioLMs and more.
One feedback from community we keep getting is how difficult it is to directly use llama.cpp. Often times users end up using Ollama or GUIs like LMStudio or Jan (there's many more that I'm missing). However, it'd be great to offer a path to use llama.cpp in a more friendly and easy way to end consumers too.
Currently if someone was to use llama.cpp directly:
brew install llama.cpp
worksThis adds barrier for non technically inclined people specially since in all the above methods users would have to reinstall llama.cpp to get upgrades (and llama.cpp makes releases per commit - not a bad thing, but becomes an issue since you need to upgrade more frequently)
Opening this issue to discuss what could be done to package llama.cpp better and allow users to maybe download an executable and be on their way.
More so, are there people in the community interested in taking this up?
Beta Was this translation helpful? Give feedback.
All reactions