Replies: 1 comment
-
Loving llm so far as well! I think this is what I'm seeing though too? Following the steps from this blog post: https://simonwillison.net/2023/Aug/1/llama-2-mac/ Input this: Got this (notice the truncation at the end):
Expected something more complete as per the blog post like this:
This may also just be due to my misunderstanding how these models work. Very new to llms. Installed via pip |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I love llm so far. Thanks for building this!
One request is to add the ability to give the "ctx_size" option to llama-cpp when running models from hf (for example). Here is the output from
llm models list
in my environment:OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
gpt4all: ggml-all-MiniLM-L6-v2-f16 - Bert, 43.41MB download, needs 1GB RAM
gpt4all: orca-mini-3b - Mini Orca (Small), 1.80GB download, needs 4GB RAM
gpt4all: ggml-gpt4all-j-v1 - Groovy, 3.53GB download, needs 8GB RAM
gpt4all: llama-2-7b-chat - Llama-2-7B Chat, 3.53GB download, needs 8GB RAM
...
gpt4all: wizardLM-13B-Uncensored - Wizard Uncensored, 7.58GB download, needs 16GB RAM
LlamaModel: llama-2-7b-chat.ggmlv3.q8_0 (aliases: llama2-chat, l2c)
LlamaModel: Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_0 (aliases: wizard-vicuna-7b, wizard)
For the last two, the default batch size is 512 tokens. I know the context size is 4096 tokens, but what seems to happen is that there is a "break" after 512 tokens while the model finalizes its answer. See ggml-org/llama.cpp#1403 for a discussion.
The problem is, llm appears to take this "break" as the end of the response. Thus, I cannot get more than 512 tokens per response for the LlamaModels above.
Any suggestions would be great!
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions