llama.cpp long term strategy #16288
Replies: 1 comment
-
I am one of the most affected in terms of time spent reacting to changes, as I maintain a fork. We don't need planning friction now. Trial and error is fine and its getting new stuff in and broken stuff fixed fast. I type this as I'm even feeling kinda frustrated with the new webui in the server tool. That problem will pass. My suggestion to the original poster. Clone the repo locally, and work with that particular vintage until you run into trouble with something. Then do it again, adjust your code, work until trouble. It's workable. You'll soon find it quite stable for functionality you care about now. My fork is here: https://github.com/BradHutchings/Mmojo-Server |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The llama.cpp project goes somewhere in a murky seas of trial and error loop. Concentrating on a right course would be very important for project's success. I hope it's obvious.
If we look at the LLM usage stage we can see that there are some layers of expertise. First is hardware, next goes low level software, next is math related algorithmic software, it is followed by frameworks, and, finally, there are myriads of application level solutions.
llama.cpp has expertise in the field of low level and algorithmic software. Hardware is, obviously, too much for the project. Frameworks and applications are business entities and require some heavy investments to flourish.
Then the effort concentration can be beneficial at the level where project growth is limited by a missing funding. But how to get funds and to leverage the expertise?
It is the way of back-end software. Linux, for example, is a back bone for myriads of services. So, llama.cpp could be a back bone of a myriad of LLM centered applications.
If such a way would be accepted we need to attract framework and application developers by an efficient and easy to use back-end. The performance side of such solution is already at a good level, even if the level varies over different hardware. But from the "easy to use" side even C developers should put essential efforts just to adapt the solution for their needs, and developers with other language expertise have to just sit and wait when some useful wrapper would be built by somebody else. That's why developers go to python and related frameworks, where the easiness of usage is a main concern.
What could be done.
First, obviously, there is a need for flexible and stable API, around which other language wrappers would be developed. In it's current state the API is not as stable as expected and offers not very high level of flexibility. And, most important, it is poorly documented.
Second, there should be some base frameworks for different languages with in demand functionality. It could start from a simple chat bot (as is the case with llama.cpp web server), but to evolve towards agentic environment with rich pipelining and tooling available (which is not even close to the current level of examples and tools available).
Only such, business oriented, approach could lead to essential investments and project success in a long term.
From my side I could offer java based framework with agentic capabilities, but, because it depends heavily on the back-end, my expertise is not enough to develop it in efficient and easy to use manner without essential efforts from the llama.cpp side. And my case is not the only one, of course.
So, having such overview you, can decide on the future of the project in a bit more informed way. I hope for the best and wish you luck with your choice.
Beta Was this translation helpful? Give feedback.
All reactions