[Enhancement] Adding support for Openai TTS models #54

amitsnow · 2025-10-16T05:11:56Z

Summary

Adds Text-to-Speech (TTS) support for OpenAI and Azure OpenAI models in SyGra. Generate high-quality audio from text with voices, audio formats, and variable speed control. Base64-encoded audio is automatically saved to organized file structures.

Explain the features implemented:

TTS Models: TTS model support for both OpenAI and Azure OpenAI
Auto File Saving: Base64 audio data URLs automatically saved to task_dir/output_dir/audio/
Smart Processing: Recursively handles nested structures, replaces data URLs with file paths
Multi-format Support: 9 audio formats + 8 image formats

Performance Impact

Minimal memory impact (base64 converted to files during checkpoint writing)

How to Test

Unit Tests

# Run TTS tests
pytest tests/core/models/test_openai_client.py::test_create_speech_async -v
pytest tests/utils/test_audio_utils_save.py -v
pytest tests/utils/test_multimodal_processor.py -v

End-to-End Test

Add the model config in models.yaml:

   tts_openai:
   model: tts
   output_type: audio  # This triggers TTS functionality
   model_type: azure_openai  # Use azure_openai or openai model type
   api_version: 2025-03-01-preview
   parameters:
     voice: "alloy"
     response_format: "wav"

Add URL and API_KEY:
URL and api_key should be defined at .env file as SYGRA_TTS_OPENAI_URL and SYGRA_TTS_OPENAI_TOKEN
Run pipeline
File Structure Output

task_dir/
├── output/audio/
│   ├── record_001_generated_audio_0.mp3
│   └── record_002_generated_audio_0.mp3
└── output.json

Checklist

Lint fixes and unit testing done
End to end task testing
Documentation updated

Notes

Text limit: 4096 characters per request (OpenAI constraint)

…nodes

…b64 encoded urls

psriramsnc

LGTM 🚀

amitsnow added 8 commits October 16, 2025 10:40

Adding support for Openai TTS models

3f816ab

Linting and format fixes

3fdbf41

Cleaner design to handle multimodal outputs

2c47d87

Adding new tests for custom_models.py

9e64fca

Renaming test files

6d3bdf0

Minor fixes

1ce9b20

Creating base64 encoded data url to ensure reusability in subsequent …

d36570e

…nodes

Changes to store multimodal data in output file as paths in place of …

355419d

…b64 encoded urls

amitsnow marked this pull request as ready for review October 20, 2025 10:01

amitsnow requested a review from a team as a code owner October 20, 2025 10:01

amitsnow added 3 commits October 20, 2025 15:39

linting fixes

73ee2a1

fix lint issues

082e8ae

mypy fixes

a81a4e7

amitsnow self-assigned this Oct 23, 2025

amitsnow and others added 2 commits October 23, 2025 10:53

Adding missing documentation to tts

b015739

Merge branch 'main' into scratch/openai_tts

23f036f

psriramsnc approved these changes Oct 23, 2025

View reviewed changes

zephyrzilla approved these changes Oct 23, 2025

View reviewed changes

zephyrzilla changed the title ~~Adding support for Openai TTS models~~ [Enhancement] Adding support for Openai TTS models Oct 23, 2025

Merge branch 'main' into scratch/openai_tts

ce9cd24

vipul-mittal approved these changes Oct 24, 2025

View reviewed changes

amitsnow merged commit ed4c701 into main Oct 24, 2025
6 checks passed

amitsnow deleted the scratch/openai_tts branch October 24, 2025 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Adding support for Openai TTS models #54

[Enhancement] Adding support for Openai TTS models #54

Uh oh!

amitsnow commented Oct 16, 2025 •

edited

Loading

Uh oh!

psriramsnc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Enhancement] Adding support for Openai TTS models #54

[Enhancement] Adding support for Openai TTS models #54

Uh oh!

Conversation

amitsnow commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Explain the features implemented:

Performance Impact

How to Test

Unit Tests

End-to-End Test

Checklist

Notes

Uh oh!

psriramsnc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

amitsnow commented Oct 16, 2025 •

edited

Loading