-
Notifications
You must be signed in to change notification settings - Fork 20
Description
I'm experiencing an issue where sending a second text chunk to the same TTS context causes the ongoing audio generation to immediately stop, even when using the continue_ parameter correctly.
Expected Behavior
When sending multiple text chunks to the same context with continue_=True, the audio should continue generating sequentially for all chunks.
Actual Behavior
The audio generation for the first text stops immediately when the second context.send() is called.
Environment
Cartesia SDK Version: 2.0.17
Python Version: 3.13.2
Code Example
from cartesia import AsyncCartesia, OutputFormat_RawParams, TtsRequestIdSpecifierParams
# Setup
client = AsyncCartesia(api_key="...")
ws = await client.tts.websocket()
context = ws.context("my-context-id")
# First text - this works fine
await context.send(
model_id="sonic-3",
transcript="First text to convert to speech.",
voice=TtsRequestIdSpecifierParams(mode="id", id="ac197a78-cec7-4c50-93e5-93bdc1910b11"),
stream=True,
output_format=OutputFormat_RawParams(
container="raw",
encoding="pcm_s16le",
sample_rate=22050,
),
continue_=False # First message
)
# Second text - this causes the audio to stop generating
await context.send(
model_id="sonic-3",
transcript="Second text to convert to speech.",
voice=TtsRequestIdSpecifierParams(mode="id", id="ac197a78-cec7-4c50-93e5-93bdc1910b11"),
stream=True,
output_format=OutputFormat_RawParams(
container="raw",
encoding="pcm_s16le",
sample_rate=22050,
),
continue_=True # Continuation of previous
)
await context.no_more_inputs()
# Receiving audio
async for output in context.receive():
if output.audio:
# Audio for second text never arrives
handle_audio(output.audio)
Observations
- The first text generates audio successfully
- The audio generation stops exactly when
context.send()is called for the second text. (ex: if I add a 1s delay - it will stream the audio output for 1s and after that stop) - Waiting for the first audio to be completely generated before sending the second text doesn't help either, the second text is not generated at all.
- The
continue_flag is set correctly (Falsefor first,Truefor subsequent) - I'm receiving audio in a concurrent task that starts before sending any text
Questions
- Is it expected to call
context.send()multiple times on the same context? - Does
continue_=Truerequire a specific timing or pattern between sends? - Should I be using separate contexts for each text chunk instead? If yes, how do I keep the voice consistency/prosody between them?
- Is there a way to queue multiple text chunks for sequential processing? (to keep voice consistency/prosody)
My use Case
I'm streaming LLM-generated responses to TTS and need to send chunks as they arrive for minimal latency. I want to maintain voice consistency across chunks, which is why I'm trying to use the same context with continue_=True.
Any guidance would be greatly appreciated! π