GPT4All LangChain Community model trims model output #27021
Replies: 1 comment 2 replies
-
Hey @Luca-Pozzi! I'm here to help you with any bugs, questions, or contributions you have. A similar issue was discussed in this thread, where the response length was limited. The solution involves adjusting the To get longer responses, ensure you set from langchain_community.llms import GPT4All
model = GPT4All(
model=model_path,
device='gpu',
max_tokens=4096, # Increase this value to allow for longer outputs
) Additionally, check other parameters like |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
When I try to use Llama3 via the GPT4All. When I ask for a long answer to the model directly via the Python GPT4All SDK (i.e.
LANGCHAIN = False
in code), everything works as expected. In the attached fileoutput_SDK.txt
you can see a sample response with >700 words.If I do (what I expect to be) the same calling
GPT4All
from LangChain, my output is limited to ~200 words. With this, I mean that the response is not just shorter, but interrupted abruptly at a certain length. You can see a sample in the attached fileoutput_LangChain.txt
.Is this behavior expected? How can I modify it to get long answers from Llama3 via LangChain?
I tried to inspect LangChain classes
LLM
andBaseLLM
to no avail.System Info
pip freeze | FINDSTR langchain
Beta Was this translation helpful? Give feedback.
All reactions