Replies: 1 comment 1 reply
-
Actually we do something even better - KoboldAI already has automatic context handling, and will automatically displace older context as new context exceeds the limit. This context length limit is dynamic and can be configured from the Settings panel inside Kobold UI. So if you send a longer text, it should be appropriately parsed to ensure the correct limit is maintained while keeping new text. Let me know if it doesn't work. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Title mostly explains itself. Llama.cpp implemented 'infinite output' via context swapping when the context size limit is reached. As such an argument called
--keep
was added that lets the user determine how many tokens of the initial prompt should be kept in context after this swap occurs. As far as I can tell this repository doesn't use/change the default value of n_keep or provide a argument to set it from command line. So was just wondering if it might be worth changing the default behavior or allowing some way to set it without needing to hardcode the default and compile.Beta Was this translation helpful? Give feedback.
All reactions