Replies: 2 comments
-
The solution is, to set the parameter chain = prompt | llm.bind(pipeline_kwargs={'max_new_tokens': 500}) | parser What remains still open is the question: Why is this necessary at all, as this parameter is already set in the definition of the |
Beta Was this translation helpful? Give feedback.
-
To proceed my monologue: If If So the intermediate steps setting the Is this by purpose? Are I missing something? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
When using streaming for chains, the output stops after 10 chunks. If using streaming directly with the LLM model the same happens (output stops after 10 chunks). But if the additional argument
pipeline_kwargs
is set (even though, is was set also when callingHuggingFacePipeline.from_model_id
), the streaming output is created as expected.How can I set this additional argument if streaming the output of a chain? I tried different ways to add
pipeline_kwargs
(even usingRunnablePassthrough
), but either it throws an error about unexpected arguments or it seems to be ignored.System Info
Beta Was this translation helpful? Give feedback.
All reactions