Replies: 1 comment 2 replies
-
🤖 Hello, Thank you for bringing this to our attention. The issue you're experiencing is due to the way the Here is the relevant part of the code: def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Run on new LLM token. Only available when streaming is enabled."""
# Remember the last n tokens, where n = len(answer_prefix_tokens)
self.append_to_last_tokens(token)
# Check if the last n tokens match the answer_prefix_tokens list ...
if self.check_if_answer_reached():
self.answer_reached = True
if self.stream_prefix:
for t in self.last_tokens:
sys.stdout.write(t)
sys.stdout.flush()
return
# ... if yes, then print tokens from now on
if self.answer_reached:
sys.stdout.write(token)
sys.stdout.flush() To add a newline character after each token, you could modify the def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Run on new LLM token. Only available when streaming is enabled."""
# Remember the last n tokens, where n = len(answer_prefix_tokens)
self.append_to_last_tokens(token)
# Check if the last n tokens match the answer_prefix_tokens list ...
if self.check_if_answer_reached():
self.answer_reached = True
if self.stream_prefix:
for t in self.last_tokens:
sys.stdout.write(t + '\n')
sys.stdout.flush()
return
# ... if yes, then print tokens from now on
if self.answer_reached:
sys.stdout.write(token + '\n')
sys.stdout.flush() This modification will add a newline character after each token, causing each token to appear on a new line in the output. Please note that this is a workaround and might not be the best solution. It would be better to handle this at a higher level, perhaps in the method that calls I hope this helps! Let me know if you have any other questions. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We were able to get a streaming response from LlamaCpp by using streaming=True and having CallbackManager([StreamingStdOutCallbackHandler()]). But the issue is the streamed output does not contain any new line characters or \n which makes the streamed output text appear as a long paragraph.
The content on the console appear right and also when we return the complete response, it contains new line characters and formatted. Only the streamed output is the issue. Due to this we are unable to display the llm output properly.
Tried the below and I am feeling we are missing out something, please help.
def get70BGPULLMObject():
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llm = LlamaCpp(model_path=llm_model_path,
temperature=0,
n_gpu_layers=83,
n_ctx=4096,
max_tokens=4096,
n_batch=2048,
n_gqa=8,
verbose=True,
callback_manager=callback_manager,
streaming=True,
stream_prefix=True
)
return llm
async def generateStreamingOutput(llm, question):
for item in llm.stream(json.dumps(question), stop=['Question:']):
yield item
return EventSourceResponse(generateStreamingOutput(llm, question))
Beta Was this translation helpful? Give feedback.
All reactions