You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* [WIP] Add bark and parler multi support
* Add config files for other models to easily test across models
* Use model loading wrapper function for download_models.py
* Make sure transformers>4.31.0 (required for bark model)
* Add parler dependency
* Use TTSModelWrapper for demo code
* Use TTSModelWrapper for cli
* Add outetts_language attribute
* Add TTSModelWrapper
* Update text_to_speech.py
* Pass model-specific variables as **kwargs
* Rename TTSModelWrapper to TTSInterface
* Update language argument to kwargs
* Remove parler from dependencies
Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com>
* Separate inference from TTSModel
* Make sure config model is properly registered
* Decouple loading & inference of TTS model
* Decouple loading & inference of TTS model
* Enable user to exit podcast generation gracefully
* Add Q2 Oute version to TTS_LOADERS
* Add comment for support in TTS_INFERENCE
* Update test_model_loaders.py
* Update test_text_to_speech.py
* Remove extra "use case" examples
* Add bark to readme & note about multilingual support
* Reference a repo that showcases multilingual use cases
* Change default model to 500M
* Remove support for bark and parler models
* Update docs
* Remove unused code
* Remove parler dep from tests
* Update docs
* Lint
* Lint
* Lint
* Remove transformers dependency
* Remove parler reference from docs
---------
Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com>
@@ -106,10 +106,10 @@ For the complete list of models supported out-of-the-box, visit this [link](http
106
106
107
107
### text-to-speech
108
108
109
-
We support models from the [OuteAI](https://github.com/edwko/OuteTTS) and [Parler_tts](https://github.com/huggingface/parler-tts) packages. The default text-to-speech model in this repo is [OuteTTS-0.2-500M](https://huggingface.co/OuteAI/OuteTTS-0.2-500M). Note that the `0.1-350M` version has a `CC-By-4.0` (permissive) license, whereas the newer / better `0.2-500M` version has a `CC-By-NC-4.0` (non-commercial) license.
110
-
For a complete list of models visit [Oute HF](https://huggingface.co/collections/OuteAI) (only the GGUF versions) and [Parler HF](https://huggingface.co/collections/parler-tts).
109
+
We support models from the [OuteAI](https://github.com/edwko/OuteTTS) package. The default text-to-speech model in this repo is [OuteTTS-0.2-500M](https://huggingface.co/OuteAI/OuteTTS-0.2-500M). Note that the `0.1-350M` version has a `CC-By-4.0` (permissive) license, whereas the newer / better `0.2-500M` version has a `CC-By-NC-4.0` (non-commercial) license.
110
+
For a complete list of models visit [Oute HF](https://huggingface.co/collections/OuteAI) (only the GGUF versions).
111
111
112
-
**Important note:** In order to keep the package dependencies as lightweight as possible, only the Oute interface is installed by default. If you want to use the parler models, please also follow the instructions at https://github.com/huggingface/parler-tts.
112
+
In this [repo](https://github.com/Kostis-S-Z/document-to-podcast) you can see examples of using different TTS models with minimal code changes.
Copy file name to clipboardExpand all lines: docs/customization.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,6 +74,7 @@ Looking for inspiration? Check out these examples of how others have customized
74
74
75
75
- **[Radio Drama Generator](https://github.com/stefanfrench/radio-drama-generator)**: A creative adaptation that generates radio dramas by customizing ng the Blueprint parameters.
76
76
- **[Readme-to-Podcast](https://github.com/alexmeckes/readme-to-podcast)**: This project transforms GitHub README files into podcast-style audio, showcasing the Blueprint’s ability to handle diverse text inputs.
77
+
- **[Multilingual Podcast](https://github.com/Kostis-S-Z/document-to-podcast/)**: A repo that showcases how to use this package in other languages, like Hindi, Polish, Korean and many more.
Copy file name to clipboardExpand all lines: docs/future-features-contributions.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,6 @@ The Document-to-Podcast Blueprint is an evolving project designed to grow with t
15
15
This Blueprint is designed to be a foundation you can build upon. By extending its capabilities, you can open the door to new applications, improve user experience, and adapt the Blueprint to address other use cases. Here are a few ideas for how you can expand its potential:
16
16
17
17
18
-
-**Multi-language podcast generation:** Add support for multi-language podcast generation to expand the reach of this Blueprint.
19
18
-**New modalities input:** Add support to the Blueprint to be able to handle different input modalities, like audio or images, enabling more flexibility in podcast generation.
20
19
-**Improved audio quality:** Explore and integrate more advanced open-source TTS frameworks to enhance the quality of generated audio, making podcasts sound more natural.
Copy file name to clipboardExpand all lines: docs/getting-started.md
-8Lines changed: 0 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,11 +36,3 @@ pip install -e .
36
36
```bash
37
37
python -m streamlit run demo/app.py
38
38
```
39
-
40
-
41
-
### [Optional]: Use Parler models for text-to-speech
42
-
43
-
If you want to use the [parler tts](https://github.com/huggingface/parler-tts) models, you will need to **additionally** install an optional dependency by running:
Copy file name to clipboardExpand all lines: docs/step-by-step-guide.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -160,17 +160,16 @@ In this final step, the generated podcast transcript is brought to life as an au
160
160
161
161
**1 - Model Loading**
162
162
163
-
- The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-speech` models using the `outetts`and `parler_tts` libraries.
163
+
- The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-text`and `text-to-speech` models.
164
164
165
165
- The function `load_outetts_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model, either on CPU or GPU, based on the `device` parameter. The parameter `language` also enables to swap between the languages the Oute package supports (as of Dec 2024: `en, zh, ja, ko`)
166
166
167
-
- The function `load_parler_tts_model_and_tokenizer` takes a model ID in the format `{repo}/{filename}` and loads the specified model and tokenizer, either on CPU or GPU, based on the `device` parameter.
168
167
169
168
**2 - Text-to-Speech Audio Generation**
170
169
171
170
- The [`text_to_speech.py`](api.md/#document_to_podcast.inference.text_to_speech) script converts text into audio using a specified TTS model.
172
171
173
-
- A **speaker profile** defines the voice characteristics (e.g., tone, speed, clarity) for each speaker. This is specific to each TTS package. Oute models require one of the IDs specified [here](https://github.com/edwko/OuteTTS/tree/main/outetts/version/v1/default_speakers). Parler requires natural language description of the speaker's voice and you have to use a pre-defined name (see [here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency))
172
+
- A **speaker profile** defines the voice characteristics (e.g., tone, speed, clarity) for each speaker. This is specific to each TTS package. Oute models require one of the IDs specified [here](https://github.com/edwko/OuteTTS/tree/main/outetts/version/v1/default_speakers).
174
173
175
174
- The function `text_to_speech` takes the input text (e.g. podcast script) and speaker profile, generating a waveform (audio data in a numpy array) that represents the spoken version of the text.
0 commit comments