Stream responses are 5s slow to get the first token #55

RiccardoRomagnoli · 2024-10-18T08:45:36Z

RiccardoRomagnoli
Oct 18, 2024

Hello, for my usecase I need very low latencies with llama-3.1-sonar-huge-128k-online but currently the model is very slow to respond in streaming requests. The 70b would already be enough with speedness but is it not comparable to the state of the art models like gpt4o

aarashy · 2024-10-21T20:23:51Z

aarashy
Oct 21, 2024

Hey, we know that sonar-huge is rather slow. I do recommend using sonar-large instead especially if latency is a priority for you; hopefully answer quality does not substantially degrade.
You can expect sonar-huge to remain around the current speed for the foreseeable future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream responses are 5s slow to get the first token #55

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Stream responses are 5s slow to get the first token #55

Uh oh!

RiccardoRomagnoli Oct 18, 2024

Replies: 1 comment

Uh oh!

aarashy Oct 21, 2024

RiccardoRomagnoli
Oct 18, 2024

aarashy
Oct 21, 2024