Conversation
|
@vchulski hey thanks for chatting this morning - if you could give it a go on the latest version of the SDK, we believe this should be supported? Here's an example file that creates an mp3: |
|
Hi! Thanks for the quick follow-up and the example code. Together with @vchulski we tested your script and confirmed that native MP3 generation works well with the latest SDK. However, we discovered a few important limitations during our testing: 1. SSE endpoint doesn't support MP3 format When we try to use MP3 with the async for chunk in self.client.tts.sse(
model_id=model,
transcript=text,
voice={"mode": "id", "id": voice_id},
output_format={
"container": "mp3",
"sample_rate": 44100,
"bit_rate": 192000,
},
):Error: 2. Audio quality difference We compared the audio quality between:
The SSE approach produces noticeably better audio quality in terms of voice clarity and background noise reduction. 3. Latency advantage of SSE Our latency testing shows that SSE provides significantly better first-chunk latency (~200ms faster) compared to the bytes() endpoint, which is critical for our telephony applications. Feature Request: Would it be possible to add MP3 container support to the SSE endpoint? This would give us the best of both cases: the better latency of SSE with the convenience of native MP3 encoding. This would be valuable for our use case. Thanks again for your help. |
|
Hi, @chongzluong, Any news on the mp3 support for SSE? |
Mp3 Container implementation
Added mp3 container support, following the Cartesia API documentation.
Changes
Added a new container to the dict inside OutputFormatMapping, modified get_output_format function inside TTS class and updated test accordingly.