Skip to content

Commit 5b593f5

Browse files
committed
replace 300M reference to 600M and Mini
1 parent 613564c commit 5b593f5

File tree

7 files changed

+12
-12
lines changed

7 files changed

+12
-12
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ import torch
3333

3434
device = "cuda:0" if torch.cuda.is_available() else "cpu"
3535

36-
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_300M_v0.1").to(device)
37-
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_300M_v0.1")
36+
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler_tts_mini_v0.1").to(device)
37+
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler_tts_mini_v0.1")
3838

3939
prompt = "Hey, how are you doing today?"
4040
description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."

helpers/gradio_demo/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
device = "cuda:0" if torch.cuda.is_available() else "cpu"
88

9-
repo_id = "parler-tts/parler_tts_300M_v0.1"
9+
repo_id = "parler-tts/parler_tts_mini_v0.1"
1010

1111
model = ParlerTTSForConditionalGeneration.from_pretrained(repo_id).to(device)
1212
tokenizer = AutoTokenizer.from_pretrained(repo_id)

helpers/model_init_scripts/init_model_300M.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,4 +64,4 @@
6464
model.config.pad_token_id = encodec_vocab_size
6565
model.config.decoder_start_token_id = encodec_vocab_size+1
6666

67-
model.save_pretrained(os.path.join(args.save_directory, "parler-tts-untrained-300M/"))
67+
model.save_pretrained(os.path.join(args.save_directory, "parler-tts-untrained-600M/"))

helpers/push_to_hub_scripts/push_trained_parler_tts_to_hub.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
from transformers import AutoTokenizer, AutoFeatureExtractor
33

44
path = "TODO"
5-
repo_id = "parler_tts_300M"
5+
repo_id = "parler_tts_600M"
66

77

88
AutoFeatureExtractor.from_pretrained("ylacombe/dac_44khZ_8kbps").push_to_hub(repo_id)

helpers/training_configs/librispeech_tts_r_300M_dummy.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"model_name_or_path": "./parler-tts-untrained-300M/parler-tts-untrained-300M/",
2+
"model_name_or_path": "./parler-tts-untrained-600M/parler-tts-untrained-600M/",
33
"save_to_disk": "./tmp_dataset_audio/",
44
"temporary_save_to_disk": "./audio_code_tmp/",
55

helpers/training_configs/starting_point_0.01.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"model_name_or_path": "./parler-tts-untrained-300M/parler-tts-untrained-300M/",
2+
"model_name_or_path": "./parler-tts-untrained-600M/parler-tts-untrained-600M/",
33
"save_to_disk": "./tmp_dataset_audio/",
44
"temporary_save_to_disk": "./audio_code_tmp/",
55

training/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -71,18 +71,18 @@ And then enter an authentication token from https://huggingface.co/settings/toke
7171

7272
Depending on your compute resources and your dataset, you need to choose between fine-tuning a pre-trained model and training a new model from scratch.
7373

74-
In that sense, we released a 300M checkpoint trained on 10.5K hours of annotated data under the repository id: [`parler-tts/parler_tts_300M_v0.1`](https://huggingface.co/parler-tts/parler_tts_300M_v0.1), that you can fine-tune for your own use-case.
74+
In that sense, we released a 600M checkpoint trained on 10.5K hours of annotated data under the repository id: [`parler-tts/parler_tts_mini_v0.1`](https://huggingface.co/parler-tts/parler_tts_mini_v0.1), that you can fine-tune for your own use-case.
7575

7676
You can also train you own model from scratch. You can find [here](/helpers/model_init_scripts/) examples on how to initialize a model from scratch. For example, you can initialize a dummy model with:
7777

7878
```sh
7979
python helpers/model_init_scripts/init_dummy_model.py ./parler-tts-untrained-dummy --text_model "google-t5/t5-small" --audio_model "parler-tts/dac_44khZ_8kbps"
8080
```
8181

82-
In the rest of this guide, and to reproduce the Parler-TTS v0.1 training recipe, we'll use a 300-M parameters that we'll initialize with:
82+
In the rest of this guide, and to reproduce the Parler-TTS v0.1 training recipe, we'll use a 600-M parameters model that we'll initialize with:
8383

8484
```sh
85-
python helpers/model_init_scripts/init_model_300M.py ./parler-tts-untrained-300M --text_model "google/flan-t5-base" --audio_model "parler-tts/dac_44khZ_8kbps"
85+
python helpers/model_init_scripts/init_model_600M.py ./parler-tts-untrained-600M --text_model "google/flan-t5-base" --audio_model "parler-tts/dac_44khZ_8kbps"
8686
```
8787

8888

@@ -113,7 +113,7 @@ To train Parler-TTS v0.1, we roughly used:
113113

114114
```sh
115115
accelerate launch ./training/run_parler_tts_training.py \
116-
--model_name_or_path "./parler-tts-untrained-300M/parler-tts-untrained-300M/" \
116+
--model_name_or_path "./parler-tts-untrained-600M/parler-tts-untrained-600M/" \
117117
--feature_extractor_name "parler-tts/dac_44khZ_8kbps" \
118118
--description_tokenizer_name "google/flan-t5-base" \
119119
--prompt_tokenizer_name "google/flan-t5-base" \
@@ -202,4 +202,4 @@ And finally, two additional comments:
202202

203203
> [!TIP]
204204
> Fine-tuning is as easy as modifying `model_name_or_path` to a pre-trained model.
205-
> For example: `--model_name_or_path parler-tts/parler_tts_300M_v0.1`.
205+
> For example: `--model_name_or_path parler-tts/parler_tts_mini_v0.1`.

0 commit comments

Comments
 (0)