Skip to content

Commit f835118

Browse files
authored
Merge branch 'main' into sentencepiece
2 parents ed1b117 + 3525598 commit f835118

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+4882
-1814
lines changed

README.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ TensorFlowASR implements some automatic speech recognition architectures such as
5959

6060
### Baselines
6161

62-
- **CTCModel** (End2end models using CTC Loss for training)
63-
- **Transducer Models** (End2end models using RNNT Loss for training)
62+
- **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)
63+
- **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)
6464

6565
### Publications
6666

@@ -110,7 +110,9 @@ pip install .
110110

111111
- For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh`
112112

113-
- For _training_ **Transducer Models**, run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA)
113+
- For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA)
114+
115+
- For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`)
114116

115117
- For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples)
116118

@@ -166,11 +168,17 @@ speech_config: ...
166168
model_config: ...
167169
decoder_config: ...
168170
learning_config:
169-
augmentations: ...
170-
dataset_config:
171-
train_paths: ...
172-
eval_paths: ...
173-
test_paths: ...
171+
train_dataset_config:
172+
augmentation_config: ...
173+
data_paths: ...
174+
tfrecords_dir: ...
175+
eval_dataset_config:
176+
augmentation_config: ...
177+
data_paths: ...
178+
tfrecords_dir: ...
179+
test_dataset_config:
180+
augmentation_config: ...
181+
data_paths: ...
174182
tfrecords_dir: ...
175183
optimizer_config: ...
176184
running_config:

examples/conformer/README.md

Lines changed: 5 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -6,81 +6,7 @@ Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)
66

77
## Example Model YAML Config
88

9-
```yaml
10-
speech_config:
11-
sample_rate: 16000
12-
frame_ms: 25
13-
stride_ms: 10
14-
feature_type: log_mel_spectrogram
15-
num_feature_bins: 80
16-
preemphasis: 0.97
17-
normalize_signal: True
18-
normalize_feature: True
19-
normalize_per_feature: False
20-
21-
decoder_config:
22-
vocabulary: null
23-
target_vocab_size: 1024
24-
max_subword_length: 4
25-
blank_at_zero: True
26-
beam_width: 5
27-
norm_score: True
28-
29-
model_config:
30-
name: conformer
31-
subsampling:
32-
type: conv2
33-
kernel_size: 3
34-
strides: 2
35-
filters: 144
36-
positional_encoding: sinusoid_concat
37-
dmodel: 144
38-
num_blocks: 16
39-
head_size: 36
40-
num_heads: 4
41-
mha_type: relmha
42-
kernel_size: 32
43-
fc_factor: 0.5
44-
dropout: 0.1
45-
embed_dim: 320
46-
embed_dropout: 0.0
47-
num_rnns: 1
48-
rnn_units: 320
49-
rnn_type: lstm
50-
layer_norm: True
51-
joint_dim: 320
52-
53-
learning_config:
54-
augmentations:
55-
after:
56-
time_masking:
57-
num_masks: 10
58-
mask_factor: 100
59-
p_upperbound: 0.2
60-
freq_masking:
61-
num_masks: 1
62-
mask_factor: 27
63-
64-
dataset_config:
65-
train_paths: ...
66-
eval_paths: ...
67-
test_paths: ...
68-
tfrecords_dir: ...
69-
70-
optimizer_config:
71-
warmup_steps: 10000
72-
beta1: 0.9
73-
beta2: 0.98
74-
epsilon: 1e-9
75-
76-
running_config:
77-
batch_size: 4
78-
num_epochs: 22
79-
outdir: ...
80-
log_interval_steps: 400
81-
save_interval_steps: 400
82-
eval_interval_steps: 1000
83-
```
9+
Go to [config.yml](./config.yml)
8410

8511
## Usage
8612

@@ -108,9 +34,10 @@ TFLite Conversion, see `python examples/conformer/tflite_*.py --help`
10834

10935
**Error Rates**
11036

111-
| **Test-clean** | WER (%) | CER (%) |
112-
| :------------: | :-------: | :--------: |
113-
| _Greedy_ | 6.4476862 | 2.51828337 |
37+
| **Test-clean** | WER (%) | CER (%) |
38+
| :------------: | :--------: | :--------: |
39+
| _Greedy_ | 6.37933683 | 2.4757576 |
40+
| _Greedy V2_ | 7.86670732 | 2.82563138 |
11441

11542
| **Test-other** | WER (%) | CER (%) |
11643
| :------------: | :--------: | :--------: |

examples/conformer/config.yml

Lines changed: 43 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,16 @@ speech_config:
2424
normalize_per_feature: False
2525

2626
decoder_config:
27-
vocabulary: null
28-
target_vocab_size: 1024
27+
vocabulary: ./vocabularies/librispeech_train_4_4076.subwords
28+
target_vocab_size: 4096
2929
max_subword_length: 4
3030
blank_at_zero: True
3131
beam_width: 5
3232
norm_score: True
33+
corpus_files:
34+
- /media/nlhuy/Data/ML/ASR/Raw/LibriSpeech/LibriSpeech/train-clean-100/transcripts.tsv
35+
- /media/nlhuy/Data/ML/ASR/Raw/LibriSpeech/LibriSpeech/train-clean-360/transcripts.tsv
36+
- /media/nlhuy/Data/ML/ASR/Raw/LibriSpeech/LibriSpeech/train-other-500/transcripts.tsv
3337

3438
model_config:
3539
name: conformer
@@ -53,31 +57,51 @@ model_config:
5357
prediction_rnn_units: 320
5458
prediction_rnn_type: lstm
5559
prediction_rnn_implementation: 2
56-
prediction_layer_norm: True
60+
prediction_layer_norm: False
5761
prediction_projection_units: 0
58-
joint_dim: 320
62+
joint_dim: 640
5963
joint_activation: tanh
6064

6165
learning_config:
62-
augmentations:
63-
after:
64-
time_masking:
65-
num_masks: 10
66-
mask_factor: 100
67-
p_upperbound: 0.05
68-
freq_masking:
69-
num_masks: 1
70-
mask_factor: 27
71-
72-
dataset_config:
73-
train_paths:
66+
train_dataset_config:
67+
use_tf: True
68+
augmentation_config:
69+
after:
70+
time_masking:
71+
num_masks: 10
72+
mask_factor: 100
73+
p_upperbound: 0.05
74+
freq_masking:
75+
num_masks: 1
76+
mask_factor: 27
77+
data_paths:
7478
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/train-clean-100/transcripts.tsv
75-
eval_paths:
79+
tfrecords_dir: /mnt/Miscellanea/Datasets/Speech/LibriSpeech/tfrecords-test
80+
shuffle: True
81+
cache: True
82+
buffer_size: 100
83+
drop_remainder: True
84+
85+
eval_dataset_config:
86+
use_tf: True
87+
data_paths:
7688
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/dev-clean/transcripts.tsv
7789
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/dev-other/transcripts.tsv
78-
test_paths:
90+
tfrecords_dir: /mnt/Miscellanea/Datasets/Speech/LibriSpeech/tfrecords-test
91+
shuffle: False
92+
cache: True
93+
buffer_size: 100
94+
drop_remainder: True
95+
96+
test_dataset_config:
97+
use_tf: True
98+
data_paths:
7999
- /mnt/Miscellanea/Datasets/Speech/LibriSpeech/test-clean/transcripts.tsv
80-
tfrecords_dir: /mnt/Miscellanea/Datasets/Speech/LibriSpeech/tfrecords
100+
tfrecords_dir: /mnt/Miscellanea/Datasets/Speech/LibriSpeech/tfrecords-test
101+
shuffle: False
102+
cache: True
103+
buffer_size: 100
104+
drop_remainder: True
81105

82106
optimizer_config:
83107
warmup_steps: 40000

examples/conformer/test_conformer.py

Lines changed: 14 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -25,26 +25,19 @@
2525

2626
parser = argparse.ArgumentParser(prog="Conformer Testing")
2727

28-
parser.add_argument("--config", type=str, default=DEFAULT_YAML,
29-
help="The file path of model configuration file")
28+
parser.add_argument("--config", type=str, default=DEFAULT_YAML, help="The file path of model configuration file")
3029

31-
parser.add_argument("--saved", type=str, default=None,
32-
help="Path to saved model")
30+
parser.add_argument("--saved", type=str, default=None, help="Path to saved model")
3331

34-
parser.add_argument("--tfrecords", default=False, action="store_true",
35-
help="Whether to use tfrecords as dataset")
32+
parser.add_argument("--tfrecords", default=False, action="store_true", help="Whether to use tfrecords as dataset")
3633

37-
parser.add_argument("--mxp", default=False, action="store_true",
38-
help="Enable mixed precision")
34+
parser.add_argument("--mxp", default=False, action="store_true", help="Enable mixed precision")
3935

40-
parser.add_argument("--device", type=int, default=0,
41-
help="Device's id to run test on")
36+
parser.add_argument("--device", type=int, default=0, help="Device's id to run test on")
4237

43-
parser.add_argument("--cpu", default=False, action="store_true",
44-
help="Whether to only use cpu")
38+
parser.add_argument("--cpu", default=False, action="store_true", help="Whether to only use cpu")
4539

46-
parser.add_argument("--output_name", type=str, default="test",
47-
help="Result filename name prefix")
40+
parser.add_argument("--output_name", type=str, default="test", help="Result filename name prefix")
4841

4942
args = parser.parse_args()
5043

@@ -53,7 +46,7 @@
5346
setup_devices([args.device], cpu=args.cpu)
5447

5548
from tensorflow_asr.configs.config import Config
56-
from tensorflow_asr.datasets.asr_dataset import ASRTFRecordTestDataset, ASRSliceTestDataset
49+
from tensorflow_asr.datasets.asr_dataset import ASRTFRecordDataset, ASRSliceDataset
5750
from tensorflow_asr.featurizers.speech_featurizers import TFSpeechFeaturizer
5851
from tensorflow_asr.featurizers.text_featurizers import CharFeaturizer
5952
from tensorflow_asr.runners.base_runners import BaseTester
@@ -67,19 +60,14 @@
6760
assert args.saved
6861

6962
if args.tfrecords:
70-
test_dataset = ASRTFRecordTestDataset(
71-
data_paths=config.learning_config.dataset_config.test_paths,
72-
tfrecords_dir=config.learning_config.dataset_config.tfrecords_dir,
73-
speech_featurizer=speech_featurizer,
74-
text_featurizer=text_featurizer,
75-
stage="test", shuffle=False
63+
test_dataset = ASRTFRecordDataset(
64+
speech_featurizer=speech_featurizer, text_featurizer=text_featurizer,
65+
**vars(config.learning_config.test_dataset_config)
7666
)
7767
else:
78-
test_dataset = ASRSliceTestDataset(
79-
data_paths=config.learning_config.dataset_config.test_paths,
80-
speech_featurizer=speech_featurizer,
81-
text_featurizer=text_featurizer,
82-
stage="test", shuffle=False
68+
test_dataset = ASRSliceDataset(
69+
speech_featurizer=speech_featurizer, text_featurizer=text_featurizer,
70+
**vars(config.learning_config.test_dataset_config)
8371
)
8472

8573
# build model

examples/conformer/test_subword_conformer.py

Lines changed: 15 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -25,31 +25,23 @@
2525

2626
parser = argparse.ArgumentParser(prog="Conformer Testing")
2727

28-
parser.add_argument("--config", type=str, default=DEFAULT_YAML,
29-
help="The file path of model configuration file")
28+
parser.add_argument("--config", type=str, default=DEFAULT_YAML, help="The file path of model configuration file")
3029

31-
parser.add_argument("--saved", type=str, default=None,
32-
help="Path to saved model")
30+
parser.add_argument("--saved", type=str, default=None, help="Path to saved model")
3331

34-
parser.add_argument("--tfrecords", default=False, action="store_true",
35-
help="Whether to use tfrecords as dataset")
32+
parser.add_argument("--tfrecords", default=False, action="store_true", help="Whether to use tfrecords as dataset")
3633

37-
parser.add_argument("--mxp", default=False, action="store_true",
38-
help="Enable mixed precision")
34+
parser.add_argument("--mxp", default=False, action="store_true", help="Enable mixed precision")
3935

4036
parser.add_argument("--sentence_piece", default=False, action="store_true", help="Whether to use `SentencePiece` model")
4137

42-
parser.add_argument("--device", type=int, default=0,
43-
help="Device's id to run test on")
38+
parser.add_argument("--device", type=int, default=0, help="Device's id to run test on")
4439

45-
parser.add_argument("--cpu", default=False, action="store_true",
46-
help="Whether to only use cpu")
40+
parser.add_argument("--cpu", default=False, action="store_true", help="Whether to only use cpu")
4741

48-
parser.add_argument("--subwords", type=str, default=None,
49-
help="Path to file that stores generated subwords")
42+
parser.add_argument("--subwords", type=str, default=None, help="Path to file that stores generated subwords")
5043

51-
parser.add_argument("--output_name", type=str, default="test",
52-
help="Result filename name prefix")
44+
parser.add_argument("--output_name", type=str, default="test", help="Result filename name prefix")
5345

5446
args = parser.parse_args()
5547

@@ -58,7 +50,7 @@
5850
setup_devices([args.device], cpu=args.cpu)
5951

6052
from tensorflow_asr.configs.config import Config
61-
from tensorflow_asr.datasets.asr_dataset import ASRTFRecordTestDataset, ASRSliceTestDataset
53+
from tensorflow_asr.datasets.asr_dataset import ASRTFRecordDataset, ASRSliceDataset
6254
from tensorflow_asr.featurizers.speech_featurizers import TFSpeechFeaturizer
6355
from tensorflow_asr.featurizers.text_featurizers import SubwordFeaturizer, SentencePieceFeaturizer
6456
from tensorflow_asr.runners.base_runners import BaseTester
@@ -80,19 +72,14 @@
8072
assert args.saved
8173

8274
if args.tfrecords:
83-
test_dataset = ASRTFRecordTestDataset(
84-
data_paths=config.learning_config.dataset_config.test_paths,
85-
tfrecords_dir=config.learning_config.dataset_config.tfrecords_dir,
86-
speech_featurizer=speech_featurizer,
87-
text_featurizer=text_featurizer,
88-
stage="test", shuffle=False
75+
test_dataset = ASRTFRecordDataset(
76+
speech_featurizer=speech_featurizer, text_featurizer=text_featurizer,
77+
**vars(config.learning_config.test_dataset_config)
8978
)
9079
else:
91-
test_dataset = ASRSliceTestDataset(
92-
data_paths=config.learning_config.dataset_config.test_paths,
93-
speech_featurizer=speech_featurizer,
94-
text_featurizer=text_featurizer,
95-
stage="test", shuffle=False
80+
test_dataset = ASRSliceDataset(
81+
speech_featurizer=speech_featurizer, text_featurizer=text_featurizer,
82+
**vars(config.learning_config.test_dataset_config)
9683
)
9784

9885
# build model

0 commit comments

Comments
 (0)