Skip to content

Commit 36b3776

Browse files
authored
Merge pull request #4 from ylacombe/main
Update 300M reference to Mini or 600M
2 parents 7df8eb5 + bc5632c commit 36b3776

File tree

7 files changed

+17
-17
lines changed

7 files changed

+17
-17
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Contrarily to other TTS models, Parler-TTS is a **fully open-source** release. A
77
This repository contains the inference and training code for Parler-TTS. It is designed to accompany the [Data-Speech](https://github.com/huggingface/dataspeech) repository for dataset annotation.
88

99
> [!IMPORTANT]
10-
> We're proud to release Parler-TTS v0.1, our first 300M parameter model, trained on 10.5K hours of audio data.
10+
> We're proud to release [Parler-TTS Mini v0.1](https://huggingface.co/parler-tts/parler_tts_mini_v0.1), our first 600M parameter model, trained on 10.5K hours of audio data.
1111
> In the coming weeks, we'll be working on scaling up to 50k hours of data, in preparation for the v1 model.
1212
1313
## 📖 Quick Index
@@ -33,8 +33,8 @@ import torch
3333

3434
device = "cuda:0" if torch.cuda.is_available() else "cpu"
3535

36-
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_300M_v0.1").to(device)
37-
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_300M_v0.1")
36+
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device)
37+
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1")
3838

3939
prompt = "Hey, how are you doing today?"
4040
description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."
@@ -63,7 +63,7 @@ The [training folder](/training/) contains all the information to train or fine-
6363
- [3. A training guide](/training/README.md#3-training)
6464

6565
> [!IMPORTANT]
66-
> **TL;DR:** After having followed the [installation steps](/training/README.md#requirements), you can reproduce the Parler-TTS v0.1 training recipe with the following command line:
66+
> **TL;DR:** After having followed the [installation steps](/training/README.md#requirements), you can reproduce the Parler-TTS Mini v0.1 training recipe with the following command line:
6767
6868
```sh
6969
accelerate launch ./training/run_parler_tts_training.py ./helpers/training_configs/starting_point_0.01.json

helpers/gradio_demo/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
device = "cuda:0" if torch.cuda.is_available() else "cpu"
88

9-
repo_id = "parler-tts/parler_tts_300M_v0.1"
9+
repo_id = "parler-tts/parler_tts_mini_v0.1"
1010

1111
model = ParlerTTSForConditionalGeneration.from_pretrained(repo_id).to(device)
1212
tokenizer = AutoTokenizer.from_pretrained(repo_id)

helpers/model_init_scripts/init_model_300M.py renamed to helpers/model_init_scripts/init_model_600M.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,4 +64,4 @@
6464
model.config.pad_token_id = encodec_vocab_size
6565
model.config.decoder_start_token_id = encodec_vocab_size+1
6666

67-
model.save_pretrained(os.path.join(args.save_directory, "parler-tts-untrained-300M/"))
67+
model.save_pretrained(os.path.join(args.save_directory, "parler-tts-untrained-600M/"))

helpers/push_to_hub_scripts/push_trained_parler_tts_to_hub.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
from transformers import AutoTokenizer, AutoFeatureExtractor
33

44
path = "TODO"
5-
repo_id = "parler_tts_300M"
5+
repo_id = "parler_tts_600M"
66

77

88
AutoFeatureExtractor.from_pretrained("ylacombe/dac_44khZ_8kbps").push_to_hub(repo_id)

helpers/training_configs/librispeech_tts_r_300M_dummy.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"model_name_or_path": "./parler-tts-untrained-300M/parler-tts-untrained-300M/",
2+
"model_name_or_path": "./parler-tts-untrained-600M/parler-tts-untrained-600M/",
33
"save_to_disk": "./tmp_dataset_audio/",
44
"temporary_save_to_disk": "./audio_code_tmp/",
55

helpers/training_configs/starting_point_0.01.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"model_name_or_path": "./parler-tts-untrained-300M/parler-tts-untrained-300M/",
2+
"model_name_or_path": "./parler-tts-untrained-600M/parler-tts-untrained-600M/",
33
"save_to_disk": "./tmp_dataset_audio/",
44
"temporary_save_to_disk": "./audio_code_tmp/",
55

training/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Training Parler-TTS
22

3-
**TL;DR:** After having followed the [installation steps](#requirements), you can reproduce the Parler-TTS v0.1 training recipe with the following command line:
3+
**TL;DR:** After having followed the [installation steps](#requirements), you can reproduce the [Parler-TTS Mini v0.1](https://huggingface.co/parler-tts/parler_tts_mini_v0.1) training recipe with the following command line:
44

55
```sh
66
accelerate launch ./training/run_parler_tts_training.py ./helpers/training_configs/starting_point_0.01.json
@@ -71,18 +71,18 @@ And then enter an authentication token from https://huggingface.co/settings/toke
7171

7272
Depending on your compute resources and your dataset, you need to choose between fine-tuning a pre-trained model and training a new model from scratch.
7373

74-
In that sense, we released a 300M checkpoint trained on 10.5K hours of annotated data under the repository id: [`parler-tts/parler_tts_300M_v0.1`](https://huggingface.co/parler-tts/parler_tts_300M_v0.1), that you can fine-tune for your own use-case.
74+
In that sense, we released a 600M checkpoint trained on 10.5K hours of annotated data under the repository id: [`parler-tts/parler_tts_mini_v0.1`](https://huggingface.co/parler-tts/parler_tts_mini_v0.1), that you can fine-tune for your own use-case.
7575

7676
You can also train you own model from scratch. You can find [here](/helpers/model_init_scripts/) examples on how to initialize a model from scratch. For example, you can initialize a dummy model with:
7777

7878
```sh
7979
python helpers/model_init_scripts/init_dummy_model.py ./parler-tts-untrained-dummy --text_model "google-t5/t5-small" --audio_model "parler-tts/dac_44khZ_8kbps"
8080
```
8181

82-
In the rest of this guide, and to reproduce the Parler-TTS v0.1 training recipe, we'll use a 300-M parameters that we'll initialize with:
82+
In the rest of this guide, and to reproduce the Parler-TTS Mini v0.1 training recipe, we'll use a 600M parameters model that we'll initialize with:
8383

8484
```sh
85-
python helpers/model_init_scripts/init_model_300M.py ./parler-tts-untrained-300M --text_model "google/flan-t5-base" --audio_model "parler-tts/dac_44khZ_8kbps"
85+
python helpers/model_init_scripts/init_model_600M.py ./parler-tts-untrained-600M --text_model "google/flan-t5-base" --audio_model "parler-tts/dac_44khZ_8kbps"
8686
```
8787

8888

@@ -95,7 +95,7 @@ To train your own Parler-TTS, you need datasets with 3 main features:
9595

9696
Note that we made the choice to use description of the main speech characteristics (speaker pitch, speaking rate, level of noise, etc.) but that you are free to use any handmade or generated text description that makes sense.
9797

98-
To train Parler-TTS v0.1, we used:
98+
To train Parler-TTS Mini v0.1, we used:
9999
* The full [LibriTTS-R dataset](https://huggingface.co/datasets/blabble-io/libritts_r), a 1K hours high-quality speech dataset.
100100
* A [10K hours subset](https://huggingface.co/datasets/parler-tts/mls_eng_10k) of [Multilingual LibriSpeech](https://huggingface.co/datasets/facebook/multilingual_librispeech).
101101

@@ -109,11 +109,11 @@ The script [`run_parler_tts_training.py`](/training/run_parler_tts_training.py)
109109
2. pre-compute audio tokens
110110
3. train Parler-TTS
111111

112-
To train Parler-TTS v0.1, we roughly used:
112+
To train Parler-TTS Mini v0.1, we roughly used:
113113

114114
```sh
115115
accelerate launch ./training/run_parler_tts_training.py \
116-
--model_name_or_path "./parler-tts-untrained-300M/parler-tts-untrained-300M/" \
116+
--model_name_or_path "./parler-tts-untrained-600M/parler-tts-untrained-600M/" \
117117
--feature_extractor_name "parler-tts/dac_44khZ_8kbps" \
118118
--description_tokenizer_name "google/flan-t5-base" \
119119
--prompt_tokenizer_name "google/flan-t5-base" \
@@ -202,4 +202,4 @@ And finally, two additional comments:
202202

203203
> [!TIP]
204204
> Fine-tuning is as easy as modifying `model_name_or_path` to a pre-trained model.
205-
> For example: `--model_name_or_path parler-tts/parler_tts_300M_v0.1`.
205+
> For example: `--model_name_or_path parler-tts/parler_tts_mini_v0.1`.

0 commit comments

Comments
 (0)