Implemented tts providers ( golem-tts WIT interfaces) #90

harshtech123 · 2025-08-14T14:12:10Z

this pr indeed for implementing tts providers following the same guidelines as stated in issue !
/claim #23
/closes #23

harshtech123 · 2025-08-14T14:18:15Z

AWS

aws.webm

harshtech123 · 2025-08-14T14:19:50Z

DEEPGRAM

deepgram_t0_t13.webm

harshtech123 · 2025-08-14T14:22:13Z

GOOGLE

google.webm

harshtech123 · 2025-08-14T14:23:23Z

ELEVENLABS

elevenlabs.webm

harshtech123 · 2025-08-14T14:50:07Z

@vigoo up for review ,thank you so much !

tts/aws/src/lib.rs

vigoo · 2025-12-02T17:46:55Z

tts/tts/src/durability.rs

+        fn list_voices(filter: Option<VoiceFilter>) -> Result<VoiceResults, TtsError> {
+            init_logging();
+
+            Impl::list_voices(filter)


This should also be persisted, for multiple reasons:

the list of voices coming from the server may not always be the same (or even the order can be different etc)

even if this would not be the case - we don't want to make actual requests to TTS providers during state recovery

So please review all the wrapped functions and make everything durable

done , made all the wrapped functions durable !

tts/tts/src/durability.rs

tts/aws/src/lib.rs

vigoo · 2025-12-02T18:19:25Z

tts/tts/src/durability.rs

+        }
+    }
+
+    impl<Impl: ExtendedGuest> GuestPronunciationLexicon for DurablePronunciationLexicon<Impl> {


This needs some more work because currently it does not handle correctly switching from replay to live mode. Let me show an example:

You create a pronunciation lexicon

[1] Add a new entry - it persists success, but it does not actually call the underlying thing so the entryies not stored anywhere

[2] Get the entry count - returns 0 because add did not actually add anything

At this point the worker gets restarted

Enters replay mode

Replays [1] - it's replaying the persisted entry, but there won't be any actual entry in either the wrapped entries, or the underlying one

Replays [2] - returns 0 again

Out of replay, back to live mode

Calling add again - still nothing happens

Note that even if add_entry would add an entry to the entries field it would not be correct because it would not add an entry to the underlying wrapped lexicon! And it would not sync with the backend. I added a restart in the above example to point out, that it is possible that some entries are added from replay, and then even more entries are added in live mode, and in the end the lexicon must contain all these entries (next live add must sync)

hii @vigoo , that makes sence ! you mean making the add_entry , that calls the real implementation will solves this ?

example

fn add_entry(&self, word: String, pronunciation: String) -> Result<(), TtsError> { let durability = Durability::<PronunciationEntryInput, TtsError>::new( "golem_tts", "pronunciation_lexicon_add_entry", DurableFunctionType::WriteRemote, ); if durability.is_live() { let lexicon_ref = self.lexicon_wrapper.borrow(); let result = if let Some(ref lexicon) = *lexicon_ref { lexicon.get::<Impl::PronunciationLexicon>().add_entry(word.clone(), pronunciation.clone()) } else { Err(TtsError::InternalError("Lexicon not available".to_string())) }; ``` let input = PronunciationEntryInput { word, pronunciation, }; durability.persist(input.clone(), Ok(input))?; result } else { let _persisted_input: PronunciationEntryInput = durability.replay::<PronunciationEntryInput, TtsError>()?; Ok(()) } }`

vigoo · 2025-12-02T18:23:36Z

tts/tts/src/durability.rs

+                    mut pollables,
+                    stream,
+                }) => {
+                    with_persistence_level(PersistenceLevel::PersistNothing, move || {


This with_persistence_level wrapper in a drop is weird, what was the intention?

well , i think at that time i looked at search / web-search drop implementation and followed the previous impl they are exactly similar as it was and i think of having it some durablity ...

like web-search has
fn drop(&mut self) { match self.state.take() { Some(DurableSearchSessionState::Live { session }) => { with_persistence_level(PersistenceLevel::PersistNothing, move || { drop(session); }); } Some(DurableSearchSessionState::Replay { .. }) => { // Nothing special to clean up for replay state } None => {} } }

vigoo · 2025-12-02T18:29:28Z

tts/aws/src/lib.rs

+        let params = voice_filter_to_describe_params(filter);
+
+        let response = client.describe_voices(params)?;
+        let voice_infos: Vec<VoiceInfo> = response


You are doing this post-filtering based on query here but not in the other providers. So mentioning it here, but it's in general about all search_voices implementations:

Is there any difference between query and the search-query in VoiceFilter?

If not, then let's just remove the whole query parameter and only have the filter. If there is a difference then make sure all providers are handling it the same, and explain what it means exactly in a doc comment.

behind this post-filtering and inconsistency there are two main reason

All the four providers search_voices API is not similar some allows to filter based on different parameters from native api ( e.g Elevenlabs ) others may provide searching capability from response .

some providers do not have filtering capabilities at all like deepgram but i actually implemented this by searching based on model

now i also mentioned this via doc comments for each provider

regarding difference between query and the search-query in VoiceFilter? well i dont think there is too much difference between them since both are strings and are for same purpose the reasons it was on wit because some provider calls filter / search-filter and others might call same as query .

well mostly i am using our query in all providers to be consistent but as you mentioned i will remove this query para and only have filter and use search-filter for the same purpose , thank you so much !

tts/aws/src/lib.rs

harshtech123 added 27 commits July 26, 2025 19:21

initial setup

93c1d26

init

df1f58e

wit update ! flags to enum for audio-effects

3b37aaf

durability

6bbc633

durability

09567ef

tts/tts

c3915a0

elevenlabs

e742c13

elevenlabs plus durability improvements

71a7734

env var

b6f37ec

deepgram

4dfaecb

google + warnings + env var

a6f328d

fmt + clippy

7ae85eb

aws polly

7c447aa

test/tts

f0e4055

test/tts

fd6906c

test/tts/lib.rs

ddadb1e

test/tts feuture

7299f72

aws init

d86e2eb

all providers with test !

cdf0ccb

aws completed!

f08b27e

chuncking synthesis

a6d3556

clean up!

daf1636

cleanup test!

4a8857c

clippy + fmt

0c49695

elevenlabs streaming

7cb7ed5

log!

ba16623

fmt

2a768c1

algora-pbc bot added the 🙋 Bounty claim label Aug 14, 2025

harshtech123 added 3 commits August 14, 2025 14:26

cargo.lock conflict !

cb59f45

Merge branch 'main' into tts

8146fde

test fix

dcfbdfb

rutikthakre reviewed Aug 14, 2025

View reviewed changes

tts/aws/src/lib.rs Outdated Show resolved Hide resolved

harshtech123 added 17 commits August 15, 2025 14:58

clippy

c2132a7

Merge branch 'golemcloud:main' into tts

e6d3d96

cargo.lock!

b52f50b

Merge branch 'main' into tts

37ab28e

cargo.lock !

80888cc

chore: improvements aws!

4ad2133

fmt

9f7235d

Merge branch 'golemcloud:main' into tts

774e4d5

Merge branch 'main' into tts

5633810

clippy!

faab068

refresh !

9a827c0

Merge branch 'main' into tts

9a6ac87

removed bindings.rs!

7ebbbef

tts/makefile.toml updates

bcb371b

final updates for 1.3.0!

f7c2fa5

cargo.lock!

f005a6a

Merge branch 'main' into tts

cec918c

vigoo requested changes Dec 2, 2025

View reviewed changes

harshtech123 added 4 commits December 3, 2025 18:37

durability fixes!

696c30f

improvements!

5b2b1c1

refresh!

e2727d4

removed query from search-voices!

2aae6b3

Implemented tts providers ( golem-tts WIT interfaces) #90

Are you sure you want to change the base?

Implemented tts providers ( golem-tts WIT interfaces) #90

Uh oh!

Conversation

harshtech123 commented Aug 14, 2025

Uh oh!

harshtech123 commented Aug 14, 2025

AWS

Uh oh!

harshtech123 commented Aug 14, 2025

DEEPGRAM

Uh oh!

harshtech123 commented Aug 14, 2025

GOOGLE

Uh oh!

harshtech123 commented Aug 14, 2025

ELEVENLABS

Uh oh!

harshtech123 commented Aug 14, 2025

Uh oh!

Uh oh!

vigoo Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

harshtech123 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vigoo Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

harshtech123 Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vigoo Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

harshtech123 Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vigoo Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

harshtech123 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harshtech123 Dec 3, 2025 •

edited

Loading

harshtech123 Dec 3, 2025 •

edited

Loading