Skip to content

Commit 14d47bf

Browse files
committed
updated dependencies; unified HF vocoder; minor refactoring
1 parent e8280e2 commit 14d47bf

22 files changed

+91
-598
lines changed

.dockerignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,6 @@ config/*.env
77
/logs/*
88
*.iml
99
deepvoice3_pytorch/
10-
environment.yml
10+
environment.yml
11+
*.ipynb
12+
.ipynb_checkpoints

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,8 @@ __pycache__/
33
config/.env
44
*.iml
55
deepvoice3_pytorch/
6-
config/config.*.yaml
6+
config/config.*.yaml
7+
.ipynb_checkpoints
8+
.DS_Store
9+
.vscode/
10+
*.ipynb

.gitmodules

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
[submodule "TransformerTTS"]
2-
path = TransformerTTS
3-
url = https://github.com/TartuNLP/TransformerTTS.git
4-
[submodule "tts_preprocess_et"]
5-
path = tts_preprocess_et
6-
url = https://github.com/TartuNLP/tts_preprocess_et.git
1+
[submodule "TransformerTTS"]
2+
path = TransformerTTS
3+
url = https://github.com/TartuNLP/TransformerTTS.git

Dockerfile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
FROM python:3.9
1+
# Latest version of TensorFlow is compatible with Python <= 3.12
2+
FROM python:3.10
23

34
# Install system dependencies
45
RUN apt-get update && \
@@ -26,4 +27,4 @@ RUN pip install --user -r requirements.txt && \
2627

2728
COPY --chown=app:app . .
2829

29-
ENTRYPOINT ["python", "main.py"]
30+
ENTRYPOINT ["python", "main.py", "--max-input-length", "500"]

README.md

Lines changed: 8 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,9 @@ structure:
1717

1818
```
1919
models
20-
├── hifigan
21-
│ ├── ljspeech
22-
│ │ ├── config.json
23-
│ │ └── model.pt
24-
│ ├── vctk
25-
│ │ ├── config.json
26-
│ │ └── model.pt
27-
└── tts
28-
└── multispeaker
29-
├── config.yaml
30-
└── model_weights.hdf5
20+
└── multispeaker
21+
├── config.yaml
22+
└── model_weights.hdf5
3123
```
3224

3325
## Setup
@@ -52,8 +44,7 @@ The following environment variables should be configured when running the contai
5244
- `MKL_NUM_THREADS` (optional) - number of threads used for intra-op parallelism by PyTorch (used for the vocoder model)
5345
. `16` by default. If set to a blank value, it defaults to the number of CPU cores which may cause computational
5446
overhead when deployed on larger nodes. Alternatively, the `docker run` flag `--cpuset-cpus` can be used to control
55-
this. For more details, refer to the [performance and hardware requirements](#performance-and-hardware-requirements)
56-
section below.
47+
this.
5748

5849
By default, the container entrypoint is `main.py` without additional arguments, but arguments should be defined with the
5950
`COMMAND` option. The only required flag is `--model-name` to select which model is loaded by the worker. The full list
@@ -77,7 +68,6 @@ optional arguments:
7768
The setup can be tested with the following sample `docker-compose.yml` configuration:
7869

7970
```yaml
80-
version: '3'
8171
services:
8272
rabbitmq:
8373
image: 'rabbitmq'
@@ -95,6 +85,7 @@ services:
9585
- '8000:8000'
9686
depends_on:
9787
- rabbitmq
88+
restart: always
9889
tts_worker:
9990
image: ghcr.io/tartunlp/text-to-speech-worker:latest
10091
environment:
@@ -107,6 +98,7 @@ services:
10798
- ./models:/app/models
10899
depends_on:
109100
- rabbitmq
101+
restart: always
110102
```
111103
112104
### Manual setup
@@ -116,23 +108,12 @@ The following steps have been tested on Ubuntu and is both CPU and GPU compatibl
116108
- Clone this repository with submodules
117109
- Install prerequisites:
118110
- GNU Compiler Collection (`sudo apt install build-essential`)
119-
- For a **CPU** installation we recommend using the included `requirements.txt` file in a clean environment (tested with
120-
Python 3.9)
111+
- For a **GPU** installation, make sure you have CUDA installed (see https://developer.nvidia.com/cuda-downloads)
112+
- Use the included `requirements.txt` file in a clean environment (check the compatible python version from the `Dockerfile`)
121113
```commandline
122114
pip install -r requirements.txt
123115
```
124116

125-
- For a **GPU** installation, use the `environment.yml` file instead.
126-
- Make sure you have the following prerequisites installed:
127-
- CUDA (see https://developer.nvidia.com/cuda-downloads)
128-
- Conda (see https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
129-
130-
- Then create and activate a Conda environment with all dependencies:
131-
```commandline
132-
conda env create -f environment.yml -n tts
133-
conda activate tts
134-
```
135-
136117
- Download the models from the [releases section](https://github.com/TartuNLP/text-to-speech-worker/releases) and
137118
place inside the `models/` directory.
138119

TransformerTTS

config/config.yaml

Lines changed: 14 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,21 @@
1-
vocoders:
2-
vctk: models/hifigan/vctk # the directory which should contain a .json and .pt file
3-
ljspeech: models/hifigan/ljspeech
41
tts_models:
52
multispeaker:
6-
model_path: models/tts/multispeaker # the directory that contains a yaml and hdf5 files for the model
3+
model_path: models/multispeaker # the directory that contains a yaml and hdf5 files for the model
74
frontend: 'est'
8-
speakers: # a mapping of speaker names (as they will be used in routing keys, speaker-ids in the model and the vocoder to be used)
9-
albert:
10-
speaker_id: 1
11-
vocoder: vctk
12-
indrek:
13-
speaker_id: 2
14-
vocoder: vctk
15-
kalev:
16-
speaker_id: 3
17-
vocoder: vctk
18-
kylli:
19-
speaker_id: 4
20-
vocoder: ljspeech
21-
liivika:
22-
speaker_id: 5
23-
vocoder: ljspeech
24-
mari:
25-
speaker_id: 6
26-
vocoder: ljspeech
27-
meelis:
28-
speaker_id: 7
29-
vocoder: vctk
30-
peeter:
31-
speaker_id: 8
32-
vocoder: vctk
33-
tambet:
34-
speaker_id: 9
35-
vocoder: vctk
36-
vesta:
37-
speaker_id: 10
38-
vocoder: vctk
5+
speakers: # a mapping of speaker names as they will be used in routing keys and speaker-ids in the model
6+
albert: 1
7+
indrek: 2
8+
kalev: 3
9+
kylli: 4
10+
liivika: 5
11+
mari: 6
12+
meelis: 7
13+
peeter: 8
14+
tambet: 9
15+
vesta: 10
3916
# single-speaker example:
4017
# mari:
41-
# model_path: models/tts/mari
18+
# model_path: models/mari
4219
# frontend: 'est'
4320
# speakers:
44-
# lee:
45-
# speaker_id: 0
46-
# vocoder: ljspeech
21+
# mari: 0

environment.yml

Lines changed: 0 additions & 25 deletions
This file was deleted.

main.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
11
import logging.config
2-
from argparse import ArgumentParser, FileType
2+
from argparse import ArgumentParser
33

4-
from tts_worker import read_model_config, Synthesizer, MQConsumer
4+
from tts_worker.config import read_model_config
5+
from tts_worker.synthesizer import Synthesizer
6+
from tts_worker.mq_consumer import MQConsumer
57

68

79
def parse_args():
810
parser = ArgumentParser(
911
description="A text-to-speech worker that processes incoming TTS requests via RabbitMQ."
1012
)
11-
parser.add_argument('--model-config', type=FileType('r'), default='config/config.yaml',
13+
parser.add_argument('--model-config', type=str, default='config/config.yaml',
1214
help="The model config YAML file to load.")
1315
parser.add_argument('--model-name', type=str,
1416
help="The model to load. Refers to the model name in the config file.")
15-
parser.add_argument('--log-config', type=FileType('r'), default='config/logging.prod.ini',
17+
parser.add_argument('--log-config', type=str, default='config/logging.prod.ini',
1618
help="Path to log config file.")
1719
parser.add_argument('--max-input-length', type=int, default=0,
1820
help="Optional max input length configuration - "
@@ -26,8 +28,8 @@ def parse_args():
2628

2729
def main():
2830
args = parse_args()
29-
logging.config.fileConfig(args.log_config.name)
30-
model_config = read_model_config(args.model_config.name, args.model_name)
31+
logging.config.fileConfig(args.log_config)
32+
model_config = read_model_config(args.model_config, args.model_name)
3133

3234
tts = Synthesizer(model_config, args.max_input_length)
3335
consumer = MQConsumer(tts)

requirements.txt

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
1-
librosa==0.9.2
2-
tensorflow-cpu==2.11.0
3-
nltk==3.8.1
4-
estnltk==1.6.9.1b0
5-
pika==1.3.1
6-
torch==1.13.1
7-
torchvision
8-
torchaudio
9-
pyyaml==6.0
10-
pydantic==1.10.4
11-
python-dotenv==0.21.0
12-
ruamel.yaml==0.17.21
13-
phonemizer==3.2.1
14-
unidecode==1.3.6
1+
# TransformerTTS requirements
2+
librosa==0.11.0
3+
tensorflow==2.13.0
4+
ruamel.yaml
5+
# Worker requirements:
6+
nltk==3.9.2
7+
pika==1.3.2
8+
pydantic
9+
pydantic-settings
10+
python-dotenv
11+
# Preprocessing requirements:
12+
git+https://github.com/TartuNLP/tts_preprocess_et.git@v1.1.0
13+
# Vocoder requirements:
14+
speechbrain==1.0.2
15+
torch==2.1.2
16+
torchaudio==2.1.2
17+
huggingface-hub==0.29.2

0 commit comments

Comments
 (0)