Replies: 1 comment 2 replies
-
Hi @MadsRC thanks for this. Curious, do you see this behavior when just using the LiteLLM Python SDK. It might be the way we set the chunk size when calling bedrock models. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am a happy, happy user of LiteLLM, but some of my users have asked me why streaming through LiteLLM seems "slow" or as if tokens are being batched in LiteLLM before being send over. Not having a great answer (other than that the amount of serialization and deserialization that LiteLLM is bound to hurt in some way...), I decided to look into it a bit.
A short python script was written that request streamed inference from LiteLLM's OpenAI endpoints and then directly from Bedrock afterwards. The prompt are identical for the two requests, and so is the temperature and max tokens. It should be noted that the LiteLLM instance is running locally here, from a fresh git clone of main from today (
44a69421ead8dcde0f2a5c9be995c49eaa9a1fea
). It is configured to forward requests to the same AWS Bedrock Inference Profile as the one the script connects directly.My
config.yaml
is:Here's a short video of the output:
Screen.Recording.2025-05-30.at.19.59.27.mov
The color difference signifies the chunk content as it arrives, with the color switching between two tones to signify new chunks. Notice how the OpenAI one (which is LiteLLM in this instance) feels "chunkier" or more "laggier", where the Bedrock output feels more smooth - Remember, both of these requests eventually are served by the same model in Bedrock...
The script is written to output the chunks as they arrive.
The model queried is
us.anthropic.claude-3-7-sonnet-20250219-v1
I also write the chunks to a log, with a timestamp - one chunk per line.
The line count (thus chunk count) in these logs are:
wc -l *.log 57 bedrock_chunks.log 62 openai_chunks.log 119 total
The content of the
bedrock_chunks.log
file is:The content of the
openai_chunks.log
file is:Is anyone else seeing something similar, or have an explanation for why this is happening?
Beta Was this translation helpful? Give feedback.
All reactions