Streaming is gibberish? #9592

eRuaro · 2023-08-22T09:03:56Z

eRuaro
Aug 22, 2023

Hi! I've noticed that when enabling streaming=True I found that the LLM usually outputs gibberish and then after streaming, takes a couple of seconds to form a coherent response. Is there a way to have it stream the outputting of the coherent response instead? Here's my current code:

        llm = ChatOpenAI(
            streaming=False,
            openai_api_key=api_key,
            temperature=0,
            max_tokens=1500,
        )
        streaming_llm = ChatOpenAI(
            streaming=True,
            openai_api_key=api_key,
            temperature=0,
            max_tokens=1500,
            callbacks=[StreamingStdOutCallbackHandler()],
        )

        question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)
        doc_chain = load_qa_with_sources_chain(
            llm=streaming_llm, chain_type="map_reduce")

        chain = ConversationalRetrievalChain(
            retriever=retriever,
            question_generator=question_generator,
            combine_docs_chain=doc_chain,
        )

Here's a sample gibberish streaming response outputted to the console:

TheThe diffuseyour best Diff approachuse betThe on Mode is diffuse to mode the— right is turnA often also Spread often involves off your a-Out an Pin important big prball ec-picture Machineision perspective-focused part of
. thinkingThink and problem solving back turn, This thinking mode
 is severalon pages useful your to especially when “ when the you illustration the problem isbig are pict of difficult learning the something.ur e diffuse But new”-mode as. diffuse pin The long modeball diffuse as,
 mode we machine doesn long brain are consciously enough, focusing to’t allow on be you where the to able focus bump to latch tightly and on inters were spreadently far a to problem, we solve apart are. blocking the diffuse a mode specific. problem—but This mode of
 it
 cantothinking allows a new , allow the more fruitful approach. you brain As we’ll to to get closer to look at the where that world solution see, lies from the dif fuse because mode has a a you
’re muchmind broader of able to
 itspers own travel muchpective. farther before running into another—you bumper can’. t simply command it Can you see how to a thought can turn on travel. much But further we before’ll
 it
soon getruns into a bumper? The connections to some tricks that can are help you further transition apart between—you modes can quickly.

“Whenzoom I from was one learning cl about theump dif of fuse thought to mode, another I began that’s to quite far away. (Of
course, it’s hard to think precise, complex thoughts while in this mode.)
If you are grappling with a new concept or trying to solve a new
problem, you don’t have preexisting neural patterns notice it to help in guide my your daily
thought lifes.
For— instance,there’s I realized my no fuzzy underlying best pathway guitar to rif help fs guide always you came. to You me
 whenmay I need was to range ‘ widelyjust
 to encounter amessing around potential’ solution. For this,
 as opposed todiff whenuse I mode sat down is just the ticket intent on creating a musical!
masterpiece (in which case my songs were often clichéd and uninspiring).
Similar things happened when I was writing a school paper , trying to come up
with an idea for a school project, or trying to solve a dif ficult math problem. I now
follow the rule of thumb that is basically: The harder you push your brain to come
up with something creative, the less creative your ideas will be. So far , I have not
found a single situation where this does not apply . Ultimately , this means that
relaxation is an important part of hard work—and good work, for that matter .”The diffuse mode of thinking is a big-picture perspective that allows for a broader perspective and the ability to travel farther in search of a solution. It is useful when learning something new and can be important in problem-solving, especially for difficult problems. It is characterized by a spread-out pinball machine brain, where thoughts can travel further before encountering a bumper. The diffuse mode is not easily commanded but can be accessed by turning off precision-focused thinking and allowing for a more relaxed and open approach.
SOURCES: 6-pl, 11-pl, 9-pl, 8-pl

And here's the coherent response returned after:

"The diffuse mode of thinking is a big-picture perspective that allows for a broader perspective and the ability to travel farther in search of a solution. It is useful when learning something new and can be important in problem-solving, especially for difficult problems. It is characterized by a spread-out pinball machine brain, where thoughts can travel further before encountering a bumper. The diffuse mode is not easily commanded but can be accessed by turning off precision-focused thinking and allowing for a more relaxed and open approach.\n",

@dosu-bot · 2023-08-22T09:18:00Z

dosubot[bot]
bot Aug 22, 2023

🤖

Hello,

Thank you for reaching out and providing detailed information about your issue. The gibberish you're seeing when streaming=True is actually the language model's thought process. It's how the model generates text in real-time, and it's not always coherent because the model is constantly adjusting its output based on the new tokens it generates.

However, there's a way to only stream the final response and not the thought process. You can achieve this by setting verbose=False when you instantiate your LLMChain. Here's how you can do it:

question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT, verbose=False)
doc_chain = load_qa_with_sources_chain(
    llm=streaming_llm, chain_type="map_reduce", verbose=False
)

This will only stream the final response and not the thought process. Please note that when you define your model, you need to set verbose=True. Here's an example:

LLM = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0.1,
    openai_api_key=OPENAI_KEY,
    streaming=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True
)

In this case, streaming=True is for OpenAI to stream the response, and verbose=True is for LangChain to stream its thought process.

You can find more information about this in a similar solved issue: Support for streaming when using LLMchain?

I hope this helps! If you have any other questions or if the issue persists, please let me know.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming is gibberish? #9592

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Streaming is gibberish? #9592

Uh oh!

eRuaro Aug 22, 2023

Replies: 1 comment

Uh oh!

dosubot[bot] bot Aug 22, 2023

Sources

eRuaro
Aug 22, 2023

dosubot[bot]
bot Aug 22, 2023