Skip to content

Commit b10e562

Browse files
committed
further improvements of README
1 parent c40c6de commit b10e562

File tree

1 file changed

+24
-19
lines changed

1 file changed

+24
-19
lines changed

README.md

Lines changed: 24 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,11 @@
99
> We're proud to release Parler-TTS v0.1, our first 300M parameter model, trained on 10.5K hours of audio data.
1010
> In the coming weeks, we'll be working on scaling up to 50k hours of data, in preparation for the v1 model.
1111
12-
Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://www.text-description-to-speech.com)
13-
by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.
12+
Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://www.text-description-to-speech.com) by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.
1413

1514
Contrarily to other TTS models, Parler-TTS is a **fully open-source** release. All of the datasets, pre-processing, training code and weights are released publicly under permissive license, enabling the community to build on our work and develop their own powerful TTS models.
1615

17-
This repository contains the inference and training code for Parler-TTS. It is designed to accompany the [Data-Speech](https://github.com/ylacombe/dataspeech) repository for dataset annotation.
16+
This repository contains the inference and training code for Parler-TTS. It is designed to accompany the [Data-Speech](https://github.com/huggingface/dataspeech) repository for dataset annotation.
1817

1918
## Usage
2019

@@ -27,42 +26,35 @@ Using Parler-TTS is as simple as "bonjour". Simply use the following inference s
2726
from parler_tts import ParlerTTSForConditionalGeneration
2827
from transformers import AutoTokenizer
2928
import soundfile as sf
29+
import torch
3030

31-
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_300M_v0.1")
31+
device = "cuda:0" if torch.cuda.is_available() else "cpu"
32+
33+
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_300M_v0.1").to(device)
3234
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_300M_v0.1")
3335

3436
prompt = "Hey, how are you doing today?"
3537
description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."
3638

37-
input_ids = tokenizer(description, return_tensors="pt").input_ids
38-
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids
39+
input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
40+
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
3941

4042
generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
4143
audio_arr = generation.cpu().numpy().squeeze()
4244
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
4345
```
4446

45-
4647
## Installation steps
4748

4849
Parler-TTS has light-weight dependencies and can be installed in one line:
49-
```sh
50-
pip install parler-tts
51-
```
52-
53-
## Gradio demo
54-
55-
You can host your own Parler-TTS demo. First, install [`gradio`](https://www.gradio.app/) with:
5650

5751
```sh
58-
pip install gradio
52+
pip install git+https://github.com/huggingface/parler-tts.git
5953
```
6054

61-
Then, run:
55+
## Training
6256

63-
```python
64-
python helpers/gradio_demo/app.py
65-
```
57+
TODO
6658

6759
## Acknowledgements
6860

@@ -96,7 +88,9 @@ Namely, we're looking at ways to improve both quality and speed:
9688
- Add more evaluation metrics
9789

9890
## Citation
91+
9992
If you found this repository useful, please consider citing this work and also the original Stability AI paper:
93+
10094
```
10195
@misc{lacombe-etal-2024-parler-tts,
10296
author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
@@ -107,3 +101,14 @@ If you found this repository useful, please consider citing this work and also t
107101
howpublished = {\url{https://github.com/huggingface/parler-tts}}
108102
}
109103
```
104+
105+
```
106+
@misc{lyth2024natural,
107+
title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
108+
author={Dan Lyth and Simon King},
109+
year={2024},
110+
eprint={2402.01912},
111+
archivePrefix={arXiv},
112+
primaryClass={cs.SD}
113+
}
114+
```

0 commit comments

Comments
 (0)