- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.5k
Introduce llama-run #10291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce llama-run #10291
Conversation
b2a336e    to
    17f086b      
    Compare
  
    | Some of these builds seem hardcoded to c++11, when we use a feature from c++14. Any reason we aren't using say c++17 Any reasonable platform should be up-to date with c++17 I think | 
bf26504    to
    0d016a4      
    Compare
  
    | Converted to C++11 only | 
0af3f55    to
    33eb456      
    Compare
  
    d2711eb    to
    ca45737      
    Compare
  
    | @slaren @ggerganov PTAL, I'm hoping to add other features to this example such as, read prompt from '-p' arg and read prompt from stdin | 
| It would be good to have a more elaborated chat example, but the goal of this example is to show in the simplest way possible how to use the llama.cpp API. I don't think that these changes achieve that, I think that users will have a harder time understanding the llama.cpp API with all the extra boilerplate that is being added here. If you want to use this as the base of a new example that adds more features that would be great, but I think we should keep this example as simple as possible. | 
ca45737    to
    0fbc0ae      
    Compare
  
    | Cool sounds good @slaren, mind if I call the new example ramalama-core ? | 
0fbc0ae    to
    83988df      
    Compare
  
    | Maybe something like  | 
| SGTM | 
| It's also tempting to call this something like run and use a kinda RamaLama, LocalAI, Ollama type CLI interface to interact with models. Kinda like daemonless Ollama:  | 
| We could even possibly add https://, http://, ollama://, hf:// as valid syntaxes to pull models since they all are just a http pull in the end | 
| That might be implemented as llama-pull that llama-run can fork/exec (or they share a common library) | 
1d7f97c    to
    4defcf2      
    Compare
  
    | I will be afk for 3 weeks, so expect inactivity in this PR. I did the rename in case we want to merge as is and not leave this go stale. Although the syntax will completely change to: llama-run [file://]somedir/somefile.gguf [prompt] [flags] file:// will be optional, but will set up the possibility of adding the pullers discussed above. | 
| This drives the compiler crazy FWIW: seems like it only wants to build C code in that file or something. | 
| You are probably doing something wrong. Put it at the end of the file, include  | 
4defcf2    to
    3760e6e      
    Compare
  
    | Added the smart pointers typedef stuff @slaren | 
        
          
                include/llama-cpp.h
              
                Outdated
          
        
      | typedef std::unique_ptr<llama_model, llama_model_deleter> llama_model_ptr; | ||
| typedef std::unique_ptr<llama_context, llama_context_deleter> llama_context_ptr; | ||
| typedef std::unique_ptr<llama_sampler, llama_sampler_deleter> llama_sampler_ptr; | ||
| typedef std::unique_ptr<char[]> char_array_ptr; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest look good, but this one (char_array_ptr) does not belong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also need to use more parameters as references, I notice I still pass raw pointers in places where it could be a reference to a smart pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed char_array_ptr
32b2999    to
    bdac00f      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start for an improved chat example.
llama-cpp.h also needs to be added to the list of public headers in CMakeLists.txt.
bd63eb5    to
    a9054cb      
    Compare
  
    | Added as public header | 
a9054cb    to
    faeba27      
    Compare
  
    It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>
faeba27    to
    aa0a507      
    Compare
  
    | Happy to merge this @slaren @ggerganov Although there will be breaking changes in a follow on PR, want to change the syntax to this (ramalama/ollama style): Eventually want to add optional prefixes for the model also (like file://, hf://, maybe we implement http://, https://, ollama:// eventually also, its not that hard), but for no prefix we will just assume the string is a file path I guess. | 
| Sounds good. Btw, to support dynamic loading of backends (#10469), you should add a call to  | 
It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>
It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.