You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm building a system with langgraph using an agent, which can call tools or give the final answer. This system uses ReActSingleInputOutputParser by default.
However, it is not capable of streaming at the moment, since ReActSingleInputOutputParser is first collecting the full answer from the LLM before it checks for the final answer or the tool usage (by returning a corresponding object for both options). For long answers, this can lead to a major delay in answering a user question.
To give faster answers, I would like to enable token streaming, but I was not able to find out, how to implement an AgendOutputParser, which does the same as ReActSingleInputOutputParser but is also capable of token streaming for the final answer.
My current approach is, that I just check the events, which are directly coming from the model:
on_chat_model_start is resetting a text buffer
on_chat_model_stream is then adding the content to this buffer and as soon as the "Final Answer:" string appears in the buffer, it starts to yield alls further events to the user output
This works quite well, however, it feels like misusing the event system and not following the langchain philosophy, because I just collect the output directly from the LLM, which would block me to add any further blocks to the chain, which do post-processing (and are capable to stream).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello
I'm building a system with langgraph using an agent, which can call tools or give the final answer. This system uses
ReActSingleInputOutputParser
by default.However, it is not capable of streaming at the moment, since
ReActSingleInputOutputParser
is first collecting the full answer from the LLM before it checks for the final answer or the tool usage (by returning a corresponding object for both options). For long answers, this can lead to a major delay in answering a user question.To give faster answers, I would like to enable token streaming, but I was not able to find out, how to implement an
AgendOutputParser
, which does the same asReActSingleInputOutputParser
but is also capable of token streaming for the final answer.My current approach is, that I just check the events, which are directly coming from the model:
on_chat_model_start
is resetting a text bufferon_chat_model_stream
is then adding the content to this buffer and as soon as the "Final Answer:" string appears in the buffer, it starts to yield alls further events to the user outputThis works quite well, however, it feels like misusing the event system and not following the langchain philosophy, because I just collect the output directly from the LLM, which would block me to add any further blocks to the chain, which do post-processing (and are capable to stream).
Does anyone have a better idea?
Beta Was this translation helpful? Give feedback.
All reactions