Skip to content
Discussion options

You must be logged in to vote

Everything else aside:

-c 3620 -n 12288

That can't work. -c sets a context size of 3,620 tokens. Both the prompt and any generated tokens need to fit in that (unless you're using --keep but I wouldn't really recommend it and even when it works you're not likely to get coherent output too far past double the context size).

Since -c 3620, -n needs to be a lower value. Your model is LLaMA 2 so it probably supports up to -c 4096. You can also possibly look into using RoPE tricks to be able to set -c to a higher value but expecting 12k tokens worth of coherent output from a 7B model is pretty optimistic.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by hiqsociety
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants