Conversation
Codecov Report❌ Patch coverage is
... and 27 files with indirect coverage changes 🚀 New features to boost your workflow:
|
|
@markbackman can we have this PR reviewed and merged at the earliest please. It's urgent |
markbackman
left a comment
There was a problem hiding this comment.
Review of STT changes:
My biggest concern is maintainability. The different model code is spread throughout the class making it harder to understand. Could we instead centralize the model configuration and then modify the code to use the configuration based on the model used?
I spent a few minutes with Claude to demonstrate what I'm thinking. Here's the attached code reworked:
stt.py
You'll see I:
- Added the
ModelConfigas an immutable configuration - Then added the
MODEL_CONFIGSdictionary using theModelConfigto specify what the models are capable of - Then that
MODEL_CONFIGSis used in the source
In this file, I also remove mode from __init__ and removed the repetition in the language settings.
WDYT?
src/pipecat/services/sarvam/stt.py
Outdated
| model: str = "saarika:v2.5", | ||
| sample_rate: Optional[int] = None, | ||
| input_audio_codec: str = "wav", | ||
| mode: Optional[ |
There was a problem hiding this comment.
Why is mode initialized in two places. I would recommend that it be removed from __init__ and kept in InputParams only
There was a problem hiding this comment.
I feel the current implementation of mode sits fine with the required logic for it's value assignment
src/pipecat/services/sarvam/stt.py
Outdated
| - "saaras:v3": Advanced STT model (supports mode and prompts) | ||
| sample_rate: Audio sample rate. Defaults to 16000 if not specified. | ||
| input_audio_codec: Audio codec/format of the input file. Defaults to "wav". | ||
| mode: Mode of operation for saaras:v3 models only. Options: transcribe, translate, |
There was a problem hiding this comment.
Remove mode from docstring if removing mode from __init__.
| "model": self.model_name, | ||
| "vad_signals": vad_signals_str, | ||
| "high_vad_sensitivity": high_vad_sensitivity_str, | ||
| "input_audio_codec": self._input_audio_codec, |
There was a problem hiding this comment.
Is this an intentional removal?
markbackman
left a comment
There was a problem hiding this comment.
The TTS implementation has similar maintainability issues due to the models and configurations. I would recommend taking a similar approach to pulling the configuration out into separate code then using it in the service classes.
It could make sense to move the model config code outside of the stt.py and tts.py files into a separate models.py file. This would centralize the model configuration code and then keep the services focused on the core logic.
Can you make those changes and I can take a look again after the changes are made?
I agree, this sounds great in the long run too with a single source of truth |
It's a reasonable refactor but not essential I feel. The current structure works well because:
|
This PR adds support for Sarvam AI's v3 models in both Speech-to-Text (STT) and Text-to-Speech (TTS) services, while maintaining backward compatibility with existing models.
Key additions:
saaras:v3model with newmodeparameter, retainssaaras:v2.5(STT-Translate) supportbulbul:v3-betamodel with newtemperatureparameter and 25 new speaker voicesSupported Models:
saarika:v2.5speech_to_text_streamingsaaras:v2.5speech_to_text_translate_streamingsaaras:v3speech_to_text_streamingNew Features:
saaras:v3model support with newmodeparametertranscribe,translate,verbatim,translit,codemixtranscribesaaras:v2.5(STT-Translate) with auto language detectionAPI Changes:
modeparameter inInputParamsand__init__set_language()raisesValueErrorforsaaras:v2.5(auto-detects)set_prompt()now supports bothsaaras:v2.5andsaaras:v3Supported Models:
bulbul:v2bulbul:v3-betaNew Features:
bulbul:v3-betamodel support with temperature controlSarvamTTSModel: Model variantsSarvamTTSSpeakerV2: 7 speakers for v2SarvamTTSSpeakerV3: 25 speakers for v3-betaget_speakers_for_model()helper functionSpeakers:
API Changes:
temperatureparameter inInputParams(0.01-1.0, default 0.6)SarvamHttpTTSServiceandSarvamTTSService(WebSocket) updated