Replies: 10 comments 14 replies
-
Something just sort of dawned on me and I wasn't 100% sure on it. Do the other formats (gpt-j, gpt2, pyg) still do the heavy V transpose operation that affects inference speed drastically the larger the context becomes? I was peeking around and it looks like they still do? ggml-org#775 |
Beta Was this translation helpful? Give feedback.
-
Regarding your #4 and #5. I had a nice conversation with Google Bard about koboldcpp summarizing the conversation on the fly with an automatic copy and paste into the "memory / story" section to emulate a larger/indefinite context. According to Bard "it can be programmed in a matter of hours using the Kobold-Summarizer script which has a number of options that you can use to control the summary. For example, you can use the -l option to specify the length of the summary in words. You can also use the -t option to specify the type of summary. The available types are: Google Bard continues: "The code would be relatively simple to write, and it would be a great way to improve the functionality of koboldcpp. |
Beta Was this translation helpful? Give feedback.
-
Simulating AGI with koboldcpp. According to Google Bard: Here are some specific ways that you could implement this feature:
By implementing this feature, you could make your LLM more engaging and interactive, and it would give the impression that the LLM is more than just a machine. |
Beta Was this translation helpful? Give feedback.
-
And if you wanna get nuts [insert Michael Keaton here]. Implement permanent conversational history using Redis database which runs in RAM. According to Google Bard this a good way to also teach your LLM new facts without retraining. Bard thinks it's relatively easy. Of course I could be being lied to by a hallucinating AI. 😁 |
Beta Was this translation helpful? Give feedback.
-
Dynamic Context The process involves starting with c1 in active mode. Once c1 reaches a certain limit (say 512 tokens), it will switch to passive mode. A new context c2 will be created as active. From this point on, your interactions will be stored in c2. However, the bot can still access c1. Once c2 reaches 512 tokens, c1 will be destroyed, and c2 will switch to passive mode and a new context will be created as active, and the cycle will repeat. |
Beta Was this translation helpful? Give feedback.
-
SSL encryption. I don't want my smut being sent as clear text, personally. Edit: to clarify, I have openSSL set up for localhost, but the api calls are HTTP still only the browser itself is HTTPS. |
Beta Was this translation helpful? Give feedback.
-
And option to allow characters to automatically interact without user input unless the user is typing something, basically an auto submit option that pauses when the text box is selected |
Beta Was this translation helpful? Give feedback.
-
Using koboldcpp frequently as my chat ui, I would be happy if it could load a standard .json file (with prompts and settings) at launch. At the moment every time I start koboldcpp and let it launch my browser, I have to
Please let koboldcpp do this hard work. (The old alpaca.cpp had this ability with the command line "--file FNAME".) Thanks. |
Beta Was this translation helpful? Give feedback.
-
A parameter that allows changing the default smart context buffer size, for example to 512, so that when using a context size of 2048 (or even more in the future), it means less waiting time for each buffering and takes longer to reach the limit, usually 512 is good enough. |
Beta Was this translation helpful? Give feedback.
-
The new generate memory feature works but is implemented impractically. It should not be overriding the entire memory, it should be added at the beginning of the context like the author's note, or simply added at the end of the existing memory. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
If you have ideas on how to improve KoboldCpp (performance or functionality) do list them here.
This is not a Wishlist - suggestions should have a way to achieve or work towards them.
Beta Was this translation helpful? Give feedback.
All reactions