-
Notifications
You must be signed in to change notification settings - Fork 76
Implemented tts providers ( golem-tts WIT interfaces) #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
AWSaws.webm |
DEEPGRAMdeepgram_t0_t13.webm |
google.webm |
ELEVENLABSelevenlabs.webm |
|
@vigoo up for review ,thank you so much ! |
tts/tts/src/durability.rs
Outdated
| fn list_voices(filter: Option<VoiceFilter>) -> Result<VoiceResults, TtsError> { | ||
| init_logging(); | ||
|
|
||
| Impl::list_voices(filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be persisted, for multiple reasons:
- the list of voices coming from the server may not always be the same (or even the order can be different etc)
- even if this would not be the case - we don't want to make actual requests to TTS providers during state recovery
So please review all the wrapped functions and make everything durable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done , made all the wrapped functions durable !
| } | ||
| } | ||
|
|
||
| impl<Impl: ExtendedGuest> GuestPronunciationLexicon for DurablePronunciationLexicon<Impl> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some more work because currently it does not handle correctly switching from replay to live mode. Let me show an example:
- You create a pronunciation lexicon
- [1] Add a new entry - it persists success, but it does not actually call the underlying thing so the entryies not stored anywhere
- [2] Get the entry count - returns 0 because add did not actually add anything
- At this point the worker gets restarted
- Enters replay mode
- Replays [1] - it's replaying the persisted entry, but there won't be any actual entry in either the wrapped entries, or the underlying one
- Replays [2] - returns 0 again
- Out of replay, back to live mode
- Calling add again - still nothing happens
Note that even if add_entry would add an entry to the entries field it would not be correct because it would not add an entry to the underlying wrapped lexicon! And it would not sync with the backend. I added a restart in the above example to point out, that it is possible that some entries are added from replay, and then even more entries are added in live mode, and in the end the lexicon must contain all these entries (next live add must sync)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hii @vigoo , that makes sence ! you mean making the add_entry , that calls the real implementation will solves this ?
example
fn add_entry(&self, word: String, pronunciation: String) -> Result<(), TtsError> {
let durability = Durability::<PronunciationEntryInput, TtsError>::new(
"golem_tts",
"pronunciation_lexicon_add_entry",
DurableFunctionType::WriteRemote,
);
if durability.is_live() {
let lexicon_ref = self.lexicon_wrapper.borrow();
let result = if let Some(ref lexicon) = *lexicon_ref {
lexicon.get::<Impl::PronunciationLexicon>().add_entry(word.clone(), pronunciation.clone())
} else {
Err(TtsError::InternalError("Lexicon not available".to_string()))
};
```
let input = PronunciationEntryInput {
word,
pronunciation,
};
durability.persist(input.clone(), Ok(input))?;
result
} else {
let _persisted_input: PronunciationEntryInput = durability.replay::<PronunciationEntryInput, TtsError>()?;
Ok(())
}
}`
| mut pollables, | ||
| stream, | ||
| }) => { | ||
| with_persistence_level(PersistenceLevel::PersistNothing, move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This with_persistence_level wrapper in a drop is weird, what was the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well , i think at that time i looked at search / web-search drop implementation and followed the previous impl they are exactly similar as it was and i think of having it some durablity ...
like web-search has
fn drop(&mut self) { match self.state.take() { Some(DurableSearchSessionState::Live { session }) => { with_persistence_level(PersistenceLevel::PersistNothing, move || { drop(session); }); } Some(DurableSearchSessionState::Replay { .. }) => { // Nothing special to clean up for replay state } None => {} } }
| let params = voice_filter_to_describe_params(filter); | ||
|
|
||
| let response = client.describe_voices(params)?; | ||
| let voice_infos: Vec<VoiceInfo> = response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are doing this post-filtering based on query here but not in the other providers. So mentioning it here, but it's in general about all search_voices implementations:
Is there any difference between query and the search-query in VoiceFilter?
If not, then let's just remove the whole query parameter and only have the filter. If there is a difference then make sure all providers are handling it the same, and explain what it means exactly in a doc comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
behind this post-filtering and inconsistency there are two main reason
- All the four providers
search_voicesAPI is not similar some allows to filter based on different parameters from native api ( e.g Elevenlabs ) others may provide searching capability from response . - some providers do not have filtering capabilities at all like deepgram but i actually implemented this by searching based on model
now i also mentioned this via doc comments for each provider
regarding difference between query and the search-query in VoiceFilter? well i dont think there is too much difference between them since both are strings and are for same purpose the reasons it was on wit because some provider calls filter / search-filter and others might call same as query .
well mostly i am using our query in all providers to be consistent but as you mentioned i will remove this query para and only have filter and use search-filter for the same purpose , thank you so much !
this pr indeed for implementing tts providers following the same guidelines as stated in issue !
/claim #23
/closes #23