Skip to content

Conversation

@amitsnow
Copy link
Collaborator

@amitsnow amitsnow commented Oct 16, 2025

Summary

Adds Text-to-Speech (TTS) support for OpenAI and Azure OpenAI models in SyGra. Generate high-quality audio from text with voices, audio formats, and variable speed control. Base64-encoded audio is automatically saved to organized file structures.

Explain the features implemented:

  • TTS Models: TTS model support for both OpenAI and Azure OpenAI
  • Auto File Saving: Base64 audio data URLs automatically saved to task_dir/output_dir/audio/
  • Smart Processing: Recursively handles nested structures, replaces data URLs with file paths
  • Multi-format Support: 9 audio formats + 8 image formats

Performance Impact

  • Minimal memory impact (base64 converted to files during checkpoint writing)

How to Test

Unit Tests

# Run TTS tests
pytest tests/core/models/test_openai_client.py::test_create_speech_async -v
pytest tests/utils/test_audio_utils_save.py -v
pytest tests/utils/test_multimodal_processor.py -v

End-to-End Test

  1. Add the model config in models.yaml:

       tts_openai:
       model: tts
       output_type: audio  # This triggers TTS functionality
       model_type: azure_openai  # Use azure_openai or openai model type
       api_version: 2025-03-01-preview
       parameters:
         voice: "alloy"
         response_format: "wav"
  2. Add URL and API_KEY:
    URL and api_key should be defined at .env file as SYGRA_TTS_OPENAI_URL and SYGRA_TTS_OPENAI_TOKEN

  3. Run pipeline

  4. File Structure Output

task_dir/
├── output/audio/
│   ├── record_001_generated_audio_0.mp3
│   └── record_002_generated_audio_0.mp3
└── output.json

Checklist

  • Lint fixes and unit testing done
  • End to end task testing
  • Documentation updated

Notes

  • Text limit: 4096 characters per request (OpenAI constraint)

@amitsnow amitsnow marked this pull request as ready for review October 20, 2025 10:01
@amitsnow amitsnow requested a review from a team as a code owner October 20, 2025 10:01
@amitsnow amitsnow self-assigned this Oct 23, 2025
Copy link
Collaborator

@psriramsnc psriramsnc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@zephyrzilla zephyrzilla changed the title Adding support for Openai TTS models [Enhancement] Adding support for Openai TTS models Oct 23, 2025
@amitsnow amitsnow merged commit ed4c701 into main Oct 24, 2025
6 checks passed
@amitsnow amitsnow deleted the scratch/openai_tts branch October 24, 2025 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants