This repository was archived by the owner on Jul 22, 2025. It is now read-only.
Stream responses are 5s slow to get the first token #55
Replies: 1 comment
-
Hey, we know that sonar-huge is rather slow. I do recommend using sonar-large instead especially if latency is a priority for you; hopefully answer quality does not substantially degrade. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, for my usecase I need very low latencies with llama-3.1-sonar-huge-128k-online but currently the model is very slow to respond in streaming requests. The 70b would already be enough with speedness but is it not comparable to the state of the art models like gpt4o
Beta Was this translation helpful? Give feedback.
All reactions