-
Notifications
You must be signed in to change notification settings - Fork 3k
[sarvam] Expose max_session_duration as a constructor parameter to control WebSocket connection reuse #5268
Description
Feature Type
Would make my life easier
Feature Description
Currently the WebSocket connection pool has max_session_duration hardcoded to 3600 seconds, meaning connections are aggressively reused across TTS requests. Setting this to 0 (no reuse) results in a fresh WebSocket session per request, which noticeably improves TTS audio quality (likely because stale session state or cached packet context on the Sarvam server side is avoided)
Workarounds / Alternatives
The only current workaround is reaching into a private attribute after construction:
tts._pool._max_session_duration = 0This is fragile and not part of the public API.
Proposed Solution
Add a max_session_duration parameter to the TTS constructor:
OurTTS = sarvam.TTS(
target_language_code="hi-IN",
model="bulbul:v3",
speaker="ritu",
max_session_duration=0, # fresh connection per request
)And pass it through to the pool:
self._pool = utils.ConnectionPool(
...
max_session_duration=max_session_duration,
)Additional Context
This was discovered while investigating the WAV codec bug (see issue #5267 ).
During debugging, max_session_duration was manually set to 0 as a side effect, and TTS output quality was subjectively noticeably better. Speech sounded cleaner with less audio artifacting between utterances.
It is possible that the Sarvam WebSocket server maintains some internal state or packet buffer across a reused connection that negatively affects subsequent requests. A fresh connection appears to reset this.
Worth asking: should max_session_duration default to 0 for the Sarvam plugin specifically, rather than inheriting the 3600 default that may be more appropriate for other providers? Connection overhead for a TTS request is likely negligible compared to the quality tradeoff.