Skip to content

Conversation

@harshtech123
Copy link
Contributor

this pr indeed for implementing tts providers following the same guidelines as stated in issue !
/claim #23
/closes #23

@harshtech123
Copy link
Contributor Author

AWS

aws.webm

@harshtech123
Copy link
Contributor Author

DEEPGRAM

deepgram_t0_t13.webm

@harshtech123
Copy link
Contributor Author

GOOGLE

google.webm

@harshtech123
Copy link
Contributor Author

ELEVENLABS

elevenlabs.webm

@harshtech123
Copy link
Contributor Author

@vigoo up for review ,thank you so much !

fn list_voices(filter: Option<VoiceFilter>) -> Result<VoiceResults, TtsError> {
init_logging();

Impl::list_voices(filter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be persisted, for multiple reasons:

  • the list of voices coming from the server may not always be the same (or even the order can be different etc)
  • even if this would not be the case - we don't want to make actual requests to TTS providers during state recovery

So please review all the wrapped functions and make everything durable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done , made all the wrapped functions durable !

}
}

impl<Impl: ExtendedGuest> GuestPronunciationLexicon for DurablePronunciationLexicon<Impl> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs some more work because currently it does not handle correctly switching from replay to live mode. Let me show an example:

  • You create a pronunciation lexicon
  • [1] Add a new entry - it persists success, but it does not actually call the underlying thing so the entryies not stored anywhere
  • [2] Get the entry count - returns 0 because add did not actually add anything
  • At this point the worker gets restarted
  • Enters replay mode
  • Replays [1] - it's replaying the persisted entry, but there won't be any actual entry in either the wrapped entries, or the underlying one
  • Replays [2] - returns 0 again
  • Out of replay, back to live mode
  • Calling add again - still nothing happens

Note that even if add_entry would add an entry to the entries field it would not be correct because it would not add an entry to the underlying wrapped lexicon! And it would not sync with the backend. I added a restart in the above example to point out, that it is possible that some entries are added from replay, and then even more entries are added in live mode, and in the end the lexicon must contain all these entries (next live add must sync)

Copy link
Contributor Author

@harshtech123 harshtech123 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hii @vigoo , that makes sence ! you mean making the add_entry , that calls the real implementation will solves this ?

example

 fn add_entry(&self, word: String, pronunciation: String) -> Result<(), TtsError> {
            let durability = Durability::<PronunciationEntryInput, TtsError>::new(
                "golem_tts",
                "pronunciation_lexicon_add_entry",
                DurableFunctionType::WriteRemote,
            );
            if durability.is_live() {
                let lexicon_ref = self.lexicon_wrapper.borrow();
                let result = if let Some(ref lexicon) = *lexicon_ref {
                    lexicon.get::<Impl::PronunciationLexicon>().add_entry(word.clone(), pronunciation.clone())
                } else {
                    Err(TtsError::InternalError("Lexicon not available".to_string()))
                };
```
                
                let input = PronunciationEntryInput {
                    word,
                    pronunciation,
                };
                durability.persist(input.clone(), Ok(input))?;
                result
            } else {
                let _persisted_input: PronunciationEntryInput = durability.replay::<PronunciationEntryInput, TtsError>()?;
                Ok(())
            }
        }`

mut pollables,
stream,
}) => {
with_persistence_level(PersistenceLevel::PersistNothing, move || {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This with_persistence_level wrapper in a drop is weird, what was the intention?

Copy link
Contributor Author

@harshtech123 harshtech123 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well , i think at that time i looked at search / web-search drop implementation and followed the previous impl they are exactly similar as it was and i think of having it some durablity ...

like web-search has
fn drop(&mut self) { match self.state.take() { Some(DurableSearchSessionState::Live { session }) => { with_persistence_level(PersistenceLevel::PersistNothing, move || { drop(session); }); } Some(DurableSearchSessionState::Replay { .. }) => { // Nothing special to clean up for replay state } None => {} } }

let params = voice_filter_to_describe_params(filter);

let response = client.describe_voices(params)?;
let voice_infos: Vec<VoiceInfo> = response
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are doing this post-filtering based on query here but not in the other providers. So mentioning it here, but it's in general about all search_voices implementations:

Is there any difference between query and the search-query in VoiceFilter?

If not, then let's just remove the whole query parameter and only have the filter. If there is a difference then make sure all providers are handling it the same, and explain what it means exactly in a doc comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behind this post-filtering and inconsistency there are two main reason

  • All the four providers search_voices API is not similar some allows to filter based on different parameters from native api ( e.g Elevenlabs ) others may provide searching capability from response .
  • some providers do not have filtering capabilities at all like deepgram but i actually implemented this by searching based on model

now i also mentioned this via doc comments for each provider

regarding difference between query and the search-query in VoiceFilter? well i dont think there is too much difference between them since both are strings and are for same purpose the reasons it was on wit because some provider calls filter / search-filter and others might call same as query .

well mostly i am using our query in all providers to be consistent but as you mentioned i will remove this query para and only have filter and use search-filter for the same purpose , thank you so much !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Durable Text-to-Speech Provider Components for golem:tts WIT Interface

3 participants