ASR and TTS v3 update by dhruvladia-sarvam · Pull Request #3598 · pipecat-ai/pipecat

dhruvladia-sarvam · 2026-01-30T10:41:32Z

This PR adds support for Sarvam AI's v3 models in both Speech-to-Text (STT) and Text-to-Speech (TTS) services, while maintaining backward compatibility with existing models.

Key additions:

STT: Adds saaras:v3 model with new mode parameter, retains saaras:v2.5 (STT-Translate) support
TTS: Adds bulbul:v3-beta model with new temperature parameter and 25 new speaker voices

Supported Models:

Model	Language	Prompt	Mode	Endpoint
`saarika:v2.5`	Required (default: "unknown")	❌	❌	`speech_to_text_streaming`
`saaras:v2.5`	Auto-detect	✅	❌	`speech_to_text_translate_streaming`
`saaras:v3`	Required (default: "en-IN")	✅	✅	`speech_to_text_streaming`

New Features:

saaras:v3 model support with new mode parameter
- Modes: transcribe, translate, verbatim, translit, codemix
- Default mode: transcribe
Retained saaras:v2.5 (STT-Translate) with auto language detection
Model-specific validation for parameters (prompt, mode, language)
Dynamic endpoint selection based on model type

API Changes:

New mode parameter in InputParams and __init__
set_language() raises ValueError for saaras:v2.5 (auto-detects)
set_prompt() now supports both saaras:v2.5 and saaras:v3

Supported Models:

Model	Pitch	Loudness	Pace	Temperature	Default Sample Rate	Default Speaker
`bulbul:v2`	✅ (-0.75 to 0.75)	✅ (0.3-3.0)	✅ (0.3-3.0)	❌	22050 Hz	anushka
`bulbul:v3-beta`	❌	❌	✅ (0.5-2.0)	✅ (0.01-1.0)	24000 Hz	aditya

New Features:

bulbul:v3-beta model support with temperature control
New enums for type safety:
- SarvamTTSModel: Model variants
- SarvamTTSSpeakerV2: 7 speakers for v2
- SarvamTTSSpeakerV3: 25 speakers for v3-beta
get_speakers_for_model() helper function
Automatic parameter clamping for pace when outside v3 range
Model-specific defaults for sample rate, speaker, and preprocessing

Speakers:

bulbul:v2 (7): anushka, abhilash, manisha, vidya, arya, karun, hitesh
bulbul:v3-beta (25): aditya, ritu, priya, neha, rahul, pooja, rohan, simran, kavya, amit, dev, ishita, shreya, ratan, varun, manan, sumit, roopa, kabir, aayan, shubh, ashutosh, advait, amelia, sophia

API Changes:

New temperature parameter in InputParams (0.01-1.0, default 0.6)
Warnings logged when using incompatible parameters (e.g., pitch with v3)
Both SarvamHttpTTSService and SarvamTTSService (WebSocket) updated

codecov · 2026-01-30T10:44:43Z

Codecov Report

❌ Patch coverage is 0% with 162 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pipecat/services/sarvam/tts.py	0.00%	125 Missing ⚠️
src/pipecat/services/sarvam/stt.py	0.00%	37 Missing ⚠️

Files with missing lines	Coverage Δ
src/pipecat/services/sarvam/stt.py	`0.00% <0.00%> (ø)`
src/pipecat/services/sarvam/tts.py	`0.00% <0.00%> (ø)`

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dhruvladia-sarvam · 2026-02-02T06:40:36Z

@markbackman can we have this PR reviewed and merged at the earliest please. It's urgent

markbackman

Review of STT changes:

My biggest concern is maintainability. The different model code is spread throughout the class making it harder to understand. Could we instead centralize the model configuration and then modify the code to use the configuration based on the model used?

I spent a few minutes with Claude to demonstrate what I'm thinking. Here's the attached code reworked:
stt.py

You'll see I:

Added the ModelConfig as an immutable configuration
Then added the MODEL_CONFIGS dictionary using the ModelConfig to specify what the models are capable of
Then that MODEL_CONFIGS is used in the source

In this file, I also remove mode from __init__ and removed the repetition in the language settings.

WDYT?

markbackman · 2026-02-03T03:12:46Z

src/pipecat/services/sarvam/stt.py

        model: str = "saarika:v2.5",
        sample_rate: Optional[int] = None,
        input_audio_codec: str = "wav",
+        mode: Optional[


Why is mode initialized in two places. I would recommend that it be removed from __init__ and kept in InputParams only

I feel the current implementation of mode sits fine with the required logic for it's value assignment

markbackman · 2026-02-03T03:13:26Z

src/pipecat/services/sarvam/stt.py

+                - "saaras:v3": Advanced STT model (supports mode and prompts)
            sample_rate: Audio sample rate. Defaults to 16000 if not specified.
            input_audio_codec: Audio codec/format of the input file. Defaults to "wav".
+            mode: Mode of operation for saaras:v3 models only. Options: transcribe, translate,


Remove mode from docstring if removing mode from __init__.

markbackman · 2026-02-03T03:17:37Z

src/pipecat/services/sarvam/stt.py

                "model": self.model_name,
                "vad_signals": vad_signals_str,
                "high_vad_sensitivity": high_vad_sensitivity_str,
-                "input_audio_codec": self._input_audio_codec,


Is this an intentional removal?

markbackman

The TTS implementation has similar maintainability issues due to the models and configurations. I would recommend taking a similar approach to pulling the configuration out into separate code then using it in the service classes.

It could make sense to move the model config code outside of the stt.py and tts.py files into a separate models.py file. This would centralize the model configuration code and then keep the services focused on the core logic.

Can you make those changes and I can take a look again after the changes are made?

dhruvladia-sarvam · 2026-02-03T08:46:39Z

Review of STT changes:

My biggest concern is maintainability. The different model code is spread throughout the class making it harder to understand. Could we instead centralize the model configuration and then modify the code to use the configuration based on the model used?

I spent a few minutes with Claude to demonstrate what I'm thinking. Here's the attached code reworked: stt.py

You'll see I:

Added the ModelConfig as an immutable configuration

Then added the MODEL_CONFIGS dictionary using the ModelConfig to specify what the models are capable of

Then that MODEL_CONFIGS is used in the source

In this file, I also remove mode from __init__ and removed the repetition in the language settings.

WDYT?

I agree, this sounds great in the long run too with a single source of truth

dhruvladia-sarvam · 2026-02-03T09:01:01Z

It could make sense to move the model config code outside of the stt.py and tts.py files into a separate models.py file. This would centralize the model configuration code and then keep the services focused on the core logic.

Can you make those changes and I can take a look again after the changes are made?

It's a reasonable refactor but not essential I feel. The current structure works well because:

Each service is self-contained
The configs are semantically distinct

dhruvladia-sarvam added 2 commits January 30, 2026 15:53

ASR and TTS v3 update

1804558

fix

57821cf

markbackman reviewed Feb 3, 2026

View reviewed changes

dhruvladia-sarvam added 2 commits February 3, 2026 14:33

refactor(sarvam): centralize model configuration with dataclasses

1665ce1

change default speaker for bulbul:v3-beta to shubh

e6b0641

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR and TTS v3 update#3598

ASR and TTS v3 update#3598
dhruvladia-sarvam wants to merge 4 commits intopipecat-ai:mainfrom
dhruvladia-sarvam:sarvam-v3-update

dhruvladia-sarvam commented Jan 30, 2026

Uh oh!

codecov bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

dhruvladia-sarvam commented Feb 2, 2026

Uh oh!

markbackman left a comment

Uh oh!

markbackman Feb 3, 2026

Uh oh!

dhruvladia-sarvam Feb 3, 2026

Uh oh!

markbackman Feb 3, 2026

Uh oh!

markbackman Feb 3, 2026

Uh oh!

dhruvladia-sarvam Feb 3, 2026

Uh oh!

markbackman left a comment

Uh oh!

dhruvladia-sarvam commented Feb 3, 2026

Uh oh!

dhruvladia-sarvam commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhruvladia-sarvam commented Jan 30, 2026

Uh oh!

codecov bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dhruvladia-sarvam commented Feb 2, 2026

Uh oh!

markbackman left a comment

Choose a reason for hiding this comment

Uh oh!

markbackman Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

dhruvladia-sarvam Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

markbackman Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

markbackman Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

dhruvladia-sarvam Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

markbackman left a comment

Choose a reason for hiding this comment

Uh oh!

dhruvladia-sarvam commented Feb 3, 2026

Uh oh!

dhruvladia-sarvam commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 30, 2026 •

edited

Loading