Why I get different results between chat_engine.chat and chat_engine.stream_chat? #13811

No41Name · 2024-05-29T16:11:17Z

No41Name
May 29, 2024

Hi,
While implementing a RAG with chat_engine I noticed that the chat() and .stream_chat() methods are returning different results.
I thought that the second one was just providing the same response, but in a generator in order to allow for live printing.
Are there additional differences?

Here is an example of what I'm talking about, a simple RAG implementation with 2 documents (Alice in Wonderland and Shakespeare). You can notice that the answers are really different. Same configuration and temperature of the LLM set to zero.

@No41Name · 2024-05-29T16:16:48Z

dosubot[bot]
bot May 29, 2024

Hey @No41Name! I'm here to help you with any bugs, questions, or contributions you might have. Let's tackle this together!

The difference in results between chat_engine.chat and chat_engine.stream_chat is due to the different modes they operate in and the underlying mechanisms they use.

chat_engine.chat operates in ChatResponseMode.WAIT, which means it waits for the entire response to be generated before returning it.
chat_engine.stream_chat operates in ChatResponseMode.STREAM, which means it streams the response as it is being generated.

This difference in modes can lead to variations in the results, especially if the response generation involves multiple steps or function calls. Additionally, stream_chat involves specific events that are fired during its operation, which can also affect the results:

StreamChatStartEvent: Fired at the start of writing to the stream chat-engine queue.
StreamChatEndEvent: Fired at the end of writing to the stream chat-engine queue.
StreamChatErrorEvent: Fired when an exception is raised during the stream chat-engine operation.
StreamChatDeltaReceivedEvent: Fired when a delta (partial response) is received from the stream chat.

Supporting code snippets:

@trace_method("chat")
def chat(
    self,
    message: str,
    chat_history: Optional[List[ChatMessage]] = None,
    tool_choice: Union[str, dict] = "auto",
) -> AgentChatResponse:
    if chat_history is not None:
        self._memory.set(chat_history)
    self._memory.put(ChatMessage(content=message, role="user"))
    initial_token_count = len(
        self._memory.tokenizer_fn(
            " ".join([(m.content or "") for m in self._prefix_messages])
        )
    )
    all_messages = self._prefix_messages + self._memory.get(
        initial_token_count=initial_token_count
    )

    chat_response = self._llm.chat(all_messages)
    ai_message = chat_response.message
    self._memory.put(ai_message)

    return AgentChatResponse(response=str(chat_response.message.content))

@trace_method("chat")
def stream_chat(
    self,
    message: str,
    chat_history: Optional[List[ChatMessage]] = None,
    tool_choice: Union[str, dict] = "auto",
) -> StreamingAgentChatResponse:
    if chat_history is not None:
        self._memory.set(chat_history)
    self._memory.put(ChatMessage(content=message, role="user"))
    initial_token_count = len(
        self._memory.tokenizer_fn(
            " ".join([(m.content or "") for m in self._prefix_messages])
        )
    )
    all_messages = self._prefix_messages + self._memory.get(
        initial_token_count=initial_token_count
    )

    chat_response = StreamingAgentChatResponse(
        chat_stream=self._llm.stream_chat(all_messages)
    )
    thread = Thread(
        target=chat_response.write_response_to_history, args=(self._memory,)
    )
    thread.start()

    return chat_response

Additionally, the SimpleChatEngine class from the LlamaIndex framework provides asynchronous versions of these methods (achat and astream_chat), which also contribute to differences in behavior and results:

@trace_method("chat")
async def achat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> AgentChatResponse:
    if chat_history is not None:
        self._memory.set(chat_history)
    self._memory.put(ChatMessage(content=message, role="user"))
    initial_token_count = len(
        self._memory.tokenizer_fn(
            " ".join([(m.content or "") for m in self._prefix_messages])
        )
    )
    all_messages = self._prefix_messages + self._memory.get(
        initial_token_count=initial_token_count
    )

    chat_response = await self._llm.achat(all_messages)
    ai_message = chat_response.message
    self._memory.put(ai_message)

    return AgentChatResponse(response=str(chat_response.message.content))

@trace_method("chat")
async def astream_chat(
    self, message: str, chat_history: Optional[List[ChatMessage]] = None
) -> StreamingAgentChatResponse:
    if chat_history is not None:
        self._memory.set(chat_history)
    self._memory.put(ChatMessage(content=message, role="user"))
    initial_token_count = len(
        self._memory.tokenizer_fn(
            " ".join([(m.content or "") for m in the _prefix_messages])
        )
    )
    all_messages = self._prefix_messages + self._memory.get(
        initial_token_count=initial_token_count
    )

    chat_response = StreamingAgentChatResponse(
        achat_stream=await self._llm.astream_chat(all_messages)
    )
    asyncio.create_task(chat_response.awrite_response_to_history(self._memory))

    return chat_response

In summary, the different modes (WAIT vs. STREAM) used by chat_engine.chat and chat_engine.stream_chat respectively, result in different behaviors and potentially different results, influenced by the specific events fired during the streaming process. Additionally, the stream_chat method includes mechanisms to handle asynchronous operations and queue management, which can further contribute to the differences in the final output ^[1]^[2]^[3]^[4]^[5].

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why I get different results between chat_engine.chat and chat_engine.stream_chat? #13811

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Details

Select a reply

Uh oh!

Why I get different results between chat_engine.chat and chat_engine.stream_chat? #13811

Uh oh!

No41Name May 29, 2024

Replies: 1 comment

Uh oh!

dosubot[bot] bot May 29, 2024

Details

No41Name
May 29, 2024

dosubot[bot]
bot May 29, 2024