Regarding the response speed of "sonar" #127

satounity · 2025-01-27T07:54:08Z

satounity
Jan 27, 2025

The response speed of "sonar", the new model from Perplexity, is much slower than "llama-3.1-sonar-small-128k-online".
Will it improve?

benelliottj · 2025-01-27T13:46:03Z

benelliottj
Jan 27, 2025

I've also noticed this - in testing on Friday the Sonar model was returning responses with an average of 6.6s but today the same requests are taking 10-15 seconds across the board

0 replies

danc118 · 2025-01-27T18:00:17Z

danc118
Jan 27, 2025

I've just sent Perplexity an email. 20 seconds to complete via API and 2.54 seconds in the playground.

I chose Sonar for lightning quick responses to fairly simple requests.

0 replies

damianoneill · 2025-01-29T18:55:41Z

damianoneill
Jan 29, 2025

+1 for this feature request, real need for a 7/8B parameter model for low latency responses, or the existing sonar model inference significantly increased.

0 replies

DhenPadilla · 2025-02-05T19:28:56Z

DhenPadilla
Feb 5, 2025

Bumping this - The docs seem a little sparse in terms of what tweaks we can make to the params to try to drive latency down.

Would love to know how to make sure sonar responses are < 5s

2 replies

danc118 Feb 5, 2025

I'm close to giving up on Perplexity and moving to Google. They don't seem to have any interest in support.

Their support "Sam" is an AI and they don't respond after I called it out on an email (They also don't label it as AI which is really rude). "Sam" gave me incorrect details about enforcing structured outputs and it took me 3 hours to make sure I wasn't going crazy. even with json schema enforced it still puts out non structured outputs occasionally giving me errors. Super embarrassing when people try it and get errors.

I've heard Sonar is a thinly veiled wrapper of someone elses product so I'll be searching for the provider and will just use them or Google.

shubhang98 Feb 5, 2025

Our latency is up mainly because of significantly elevated traffic after the launch of our new APIs. We are working to make sure the latency is stable even during peak traffic hours.

@danc118 I sincerely apologize for your experience with our support - your feedback is very valuable and significantly improving the support experience around the API is major action item for us over the next couple of weeks.

Structured outputs is a beta feature and we are working to improve based on user feedback. We have also taken steps to make it clear in the documentation that it is a feature available to tier-3 users only.

satounity · 2025-02-07T01:49:57Z

satounity
Feb 7, 2025
Author

Currently, llama-3.1-sonar-small-128k-online'' returns fast responses, so could you please keep llama-3.1-sonar-small-128k-online'' in the future?

0 replies

satounity · 2025-02-10T00:52:20Z

satounity
Feb 10, 2025
Author

Comparing "llama-3.1-sonar-small-128k-online" and "sonar", "sonar" is slower in streaming responses.

I would like the streaming speed to be improved.

0 replies

johnperkins-pse · 2025-02-12T15:06:11Z

johnperkins-pse
Feb 12, 2025

My requests used to take ~5 seconds with the llama model I was using, but now with sonar-reasoning they take well over a minute, and I'm often getting gateway timeout errors, too.

0 replies

Regarding the response speed of "sonar" #127

Uh oh!

Replies: 7 comments · 2 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

satounity Feb 7, 2025 Author

Uh oh!

satounity Feb 10, 2025 Author

Uh oh!

Replies: 7 comments 2 replies

satounity
Feb 7, 2025
Author

satounity
Feb 10, 2025
Author