-
Notifications
You must be signed in to change notification settings - Fork 139
Description
In ggml-org/llama.cpp#17824 llama-cpp a new CLI experience reusing the llama-server infrastructure has be created, and deprecated the previous implementation.
I created a Rust implementation here https://github.com/galo/llama-cpp-rs/tree/main/examples/cli. This is interesting because it allows for some types of application the capability of directly reusing the llama-server features - ex: speculative decoding - , as well as the same parity with the rest of llama.cpp features. For this I simply exported new bindings and safe Rust implementation of the server components. The CLI is an example of how to use this infra,
Take a look and let me know if this is interesting, I have not done extensive texting - i.e. did no test vulkan/cuda/etc backend.