Skip to content

[Platform][Agent] Introduce Speech support#943

Open
Guikingone wants to merge 1 commit intosymfony:mainfrom
Guikingone:agent/voice_provider
Open

[Platform][Agent] Introduce Speech support#943
Guikingone wants to merge 1 commit intosymfony:mainfrom
Guikingone:agent/voice_provider

Conversation

@Guikingone
Copy link
Contributor

@Guikingone Guikingone commented Nov 22, 2025

Q A
Bug fix? no
New feature? yes
Docs? yes
Issues --
License MIT
  • Introduce support for TTS, STT and STS for agents
  • Add new configuration options to configure "speech" at agent-level.

Example for an OpenAI-based STS agent:

ai:
    platform:
        elevenlabs:
            api_key: '%env(ELEVEN_LABS_API_KEY)%'
        openai:
            api_key: '%env(OPENAI_API_KEY)%'

    agent:
        sts_openai:
            platform: ai.platform.openai
            model: gpt-4o
            speech:
                enabled: true
                platform: ai.platform.elevenlabs
                tts_model: eleven_multilingual_v2
                tts_options:
                    voice_id: some-voice-id
                stt_model: scribe_v1
  • Agents can either be TTS or STT independently
  • A SpeechConfiguration object handle the speech configuration
  • A SpeechProcessor handle the input/output

@OskarStark
Copy link
Contributor

To me we maybe should introduce capabilities also to platforms rather than having a voice component. As far as I understand I cannot use the Voice component standalone, right?

I don't think a dedicated component is the way to go here

@Guikingone
Copy link
Contributor Author

We can introduce it via the Platform, could be easier, the voice can be used without agents but it will requires the Platform at least.

Will update the PR to match this approach 👍🏻

@OskarStark
Copy link
Contributor

I agree, Agent scope is not needed 👍🏻

@Guikingone Guikingone changed the title [Voice] Introduce the component [Platform] Introduce VoiceProviders and VoiceListeners Nov 23, 2025
@chr-hertel
Copy link
Member

Hi @Guikingone, i agree that week lack some kind of guidance on how voices work - but same goes for other binary stuff like creating images or videos.

so two things i would like to understand

  • what's the high-level goal here - like what do you want to build?
  • why is it an extra component and not part of Platform?

btw, "speech" is more common than "vioce" isn't it?
btw2, have you seen the demo around audio and video?

@Guikingone
Copy link
Contributor Author

Guikingone commented Nov 23, 2025

what's the high-level goal here - like what do you want to build?

The main goal is to add the capacity to have an agent/platform that can "listen" and answer to inputs thanks to voice / speech (voice is used as a sugar here, could be renamed to speech), creating a workflow where you can submit voice, call the platform that transforms it to speech / text (depending on the situation you're in) and returning it to the user without frictions.

why is it an extra component and not part of Platform?

It is now part of Platform, I just pushed an update on it following the comment from @OskarStark.

btw, "speech" is more common than "voice" isn't it?

Agreed, could be renamed to Speech.

btw2, have you seen the demo around audio and video?

Yes, the goal is to ease it with a "built-in" approach / API that stays transparent for the user.

@Guikingone Guikingone changed the title [Platform] Introduce VoiceProviders and VoiceListeners [Platform] Introduce Speech support via Platform Nov 23, 2025
@chr-hertel
Copy link
Member

just realized we should the "audio" demo to "speech" as well - and i'm def not really happy with that solution there.

can we make it as easy as the structured output - like with an listener?

i like that starting point:

$result = $platform->invoke('eleven_multilingual_v2', new Text('Hello world'), [
    'voice' => 'Dslrhjl3ZpzrctukrQSN', // Brad (https://elevenlabs.io/app/voice-library?voiceId=Dslrhjl3ZpzrctukrQSN)
]);

echo $result->asVoice();

what would be the return type here? would it be same as asBinary() or asDataUri()

@Guikingone
Copy link
Contributor Author

can we make it as easy as the structured output - like with an listener?

Could be something to explore, the API is not locked for now.

what would be the return type here? would it be same as asBinary() or asDataUri()

My first approach was to do the same thing as asBinary to ease the usage.

OskarStark added a commit that referenced this pull request Nov 24, 2025
This PR was merged into the main branch.

Discussion
----------

[Demo][Website] Rename audio demo to speech

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | no
| Docs?         |
| Issues        |
| License       | MIT

Following a discussion of #943

Commits
-------

ffc2b64 Rename audio demo to speech
@OskarStark OskarStark changed the title [Platform] Introduce Speech support via Platform [Platform] Introduce Speech support Nov 24, 2025
@Guikingone
Copy link
Contributor Author

Well, might seems weird but here we go, stt, tts and sts are working like a charm ... 👀

@Guikingone Guikingone force-pushed the agent/voice_provider branch 3 times, most recently from 120f391 to 1963409 Compare November 26, 2025 12:42
@Guikingone Guikingone marked this pull request as ready for review November 26, 2025 12:44
@carsonbot carsonbot added Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review labels Nov 26, 2025
@Guikingone Guikingone marked this pull request as draft November 26, 2025 12:46
@Guikingone Guikingone marked this pull request as ready for review November 26, 2025 13:00
@OskarStark
Copy link
Contributor

@chr-hertel will have a look soon, not sure it will land in 0.3, lets keep it for now

@Guikingone Guikingone force-pushed the agent/voice_provider branch 5 times, most recently from ddb5904 to 5a0a9a2 Compare January 28, 2026 15:24
@Guikingone Guikingone force-pushed the agent/voice_provider branch 2 times, most recently from 059e90f to a3fec4f Compare February 17, 2026 15:22
@Guikingone Guikingone changed the title [Platform] Introduce Speech support [Platform][Agent] Introduce Speech support Feb 17, 2026
@Guikingone Guikingone force-pushed the agent/voice_provider branch 8 times, most recently from f42ac9c to e5d9137 Compare February 17, 2026 18:28
@Guikingone
Copy link
Contributor Author

Hi @OskarStark @chr-hertel, yes, I know, again 😄

I think that this time, that's the one, while thinking about #1572 and the comment from chris, I thought about this PR and the listener approach didn't looked like "THE" solution, especially while we have the processors, so, I asked Claude (yes, sometimes, asking for an external opinion might lead to a solution) for a "reworked implementation" that could ease the user experience and the maintenance of it, it submitted a solution close to the processors and I did the final tweaking.

So, what changed?

Now, the speech configuration is moved where it needs to be, at the Agent level, each agent can specify which platform to use, the options and so on, no more "in or out option", the agent handle the configuration, simple, clever, straight to the point.

The Platform still an important aspect of the PR (as the configuration is in the Platform but shared with the Agent), I also simplified the DI experience along with the code in the SpeechProcessor (the main entry point for speech now).

I updated the examples and reworked the documentation, much better, make more sense IMHO to be like that.

I let you take a look at it and review it if you think it deserves to be reviewed, is #1572 needed anymore? Thought question, if this PR is merged, probably not, at least, I don't see use case except for the validation/evaluation part (for now) that could require it (as speech is now at the agent level), probably another topic for another day 😄

@Guikingone Guikingone force-pushed the agent/voice_provider branch 7 times, most recently from b3586eb to d9218f0 Compare February 23, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants