StreamingStdOutCallbackHandler strips new line character #11221

mkalidas · 2023-09-29T14:31:03Z

mkalidas
Sep 29, 2023

We were able to get a streaming response from LlamaCpp by using streaming=True and having CallbackManager([StreamingStdOutCallbackHandler()]). But the issue is the streamed output does not contain any new line characters or \n which makes the streamed output text appear as a long paragraph.

The content on the console appear right and also when we return the complete response, it contains new line characters and formatted. Only the streamed output is the issue. Due to this we are unable to display the llm output properly.

Tried the below and I am feeling we are missing out something, please help.

def get70BGPULLMObject():
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(model_path=llm_model_path,
temperature=0,
n_gpu_layers=83,
n_ctx=4096,
max_tokens=4096,
n_batch=2048,
n_gqa=8,
verbose=True,
callback_manager=callback_manager,
streaming=True,
stream_prefix=True
)
return llm

async def generateStreamingOutput(llm, question):
for item in llm.stream(json.dumps(question), stop=['Question:']):
yield item
return EventSourceResponse(generateStreamingOutput(llm, question))

@dosu-bot · 2023-09-29T14:39:14Z

dosubot[bot]
bot Sep 29, 2023

🤖

Hello,

Thank you for bringing this to our attention. The issue you're experiencing is due to the way the StreamingStdOutCallbackHandler class is implemented in LangChain. Specifically, the on_llm_new_token method in this class writes each new token directly to stdout without adding a newline character. This is why the streamed output appears as a long paragraph.

Here is the relevant part of the code:

def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
    """Run on new LLM token. Only available when streaming is enabled."""

    # Remember the last n tokens, where n = len(answer_prefix_tokens)
    self.append_to_last_tokens(token)

    # Check if the last n tokens match the answer_prefix_tokens list ...
    if self.check_if_answer_reached():
        self.answer_reached = True
        if self.stream_prefix:
            for t in self.last_tokens:
                sys.stdout.write(t)
            sys.stdout.flush()
        return
    # ... if yes, then print tokens from now on
    if self.answer_reached:
        sys.stdout.write(token)
        sys.stdout.flush()

To add a newline character after each token, you could modify the on_llm_new_token method like this:

def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
    """Run on new LLM token. Only available when streaming is enabled."""

    # Remember the last n tokens, where n = len(answer_prefix_tokens)
    self.append_to_last_tokens(token)

    # Check if the last n tokens match the answer_prefix_tokens list ...
    if self.check_if_answer_reached():
        self.answer_reached = True
        if self.stream_prefix:
            for t in self.last_tokens:
                sys.stdout.write(t + '\n')
            sys.stdout.flush()
        return
    # ... if yes, then print tokens from now on
    if self.answer_reached:
        sys.stdout.write(token + '\n')
        sys.stdout.flush()

This modification will add a newline character after each token, causing each token to appear on a new line in the output.

Please note that this is a workaround and might not be the best solution. It would be better to handle this at a higher level, perhaps in the method that calls on_llm_new_token, so that the newline character is only added when necessary.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

2 replies

mkalidas Sep 29, 2023
Author

@dosu-bot, is there any other call back handler other than StreamingStdOutCallbackHandler that could help solve this issue?

dosubot[bot] bot Oct 2, 2023

🤖

Hello,

Yes, there is an alternative callback handler in LangChain that can handle new line characters or \n in the streamed output from LlamaCpp using the StreamingStdOutCallbackHandler. The FinalStreamingStdOutCallbackHandler in LangChain is designed to handle this scenario.

This is controlled by the strip_tokens parameter in the __init__ method of the FinalStreamingStdOutCallbackHandler class. If strip_tokens is set to True, it will ignore white spaces and new lines when comparing answer_prefix_tokens to last tokens.

Here is the relevant code:

def __init__(
    self,
    *,
    answer_prefix_tokens: Optional[List[str]] = None,
    strip_tokens: bool = True,
    stream_prefix: bool = False
) -> None:
    """Instantiate FinalStreamingStdOutCallbackHandler.
Args:
        answer_prefix_tokens: Token sequence that prefixes the answer.
            Default is ["Final", "Answer", ":"]
        strip_tokens: Ignore white spaces and new lines when comparing
            answer_prefix_tokens to last tokens? (to determine if answer has been
            reached)
        stream_prefix: Should answer prefix itself also be streamed?
    """
    super().__init__()
    if answer_prefix_tokens is None:
        self.answer_prefix_tokens = DEFAULT_ANSWER_PREFIX_TOKENS
    else:
        self.answer_prefix_tokens = answer_prefix_tokens
    if strip_tokens:
        self.answer_prefix_tokens_stripped = [
            token.strip() for token in self.answer_prefix_tokens
        ]
    else:
        self.answer_prefix_tokens_stripped = self.answer_prefix_tokens
    self.last_tokens = [""] * len(self.answer_prefix_tokens)
    self.last_tokens_stripped = [""] * len(self.answer_prefix_tokens)
    self.strip_tokens = strip_tokens
    self.stream_prefix = stream_prefix
    self.answer_reached = False

In this code, if strip_tokens is True, the answer_prefix_tokens_stripped list is created by stripping each token in the answer_prefix_tokens list. This effectively removes any leading or trailing white spaces and new line characters from each token.

You can find more details in the source code here.

I hope this helps! If you have any other questions, feel free to ask.

Best,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

StreamingStdOutCallbackHandler strips new line character #11221

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

StreamingStdOutCallbackHandler strips new line character #11221

Uh oh!

mkalidas Sep 29, 2023

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Sep 29, 2023

Sources

Uh oh!

mkalidas Sep 29, 2023 Author

Uh oh!

dosubot[bot] bot Oct 2, 2023

Sources

mkalidas
Sep 29, 2023

Replies: 1 comment 2 replies

dosubot[bot]
bot Sep 29, 2023

mkalidas Sep 29, 2023
Author