Skip to content

Commit 3525598

Browse files
committed
🚀 update documents
1 parent f5e3ae9 commit 3525598

File tree

6 files changed

+29
-409
lines changed

6 files changed

+29
-409
lines changed

README.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ TensorFlowASR implements some automatic speech recognition architectures such as
5959

6060
### Baselines
6161

62-
- **CTCModel** (End2end models using CTC Loss for training)
63-
- **Transducer Models** (End2end models using RNNT Loss for training)
62+
- **CTCModel** (End2end models using CTC Loss for training, currently supported DeepSpeech2, Jasper)
63+
- **Transducer Models** (End2end models using RNNT Loss for training, currently supported Conformer, ContextNet, Streaming Transducer)
6464

6565
### Publications
6666

@@ -110,7 +110,9 @@ pip install .
110110

111111
- For _training, testing and using_ **CTC Models**, run `./scripts/install_ctc_decoders.sh`
112112

113-
- For _training_ **Transducer Models**, run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA)
113+
- For _training_ **Transducer Models** with RNNT Loss from [warp-transducer](https://github.com/HawkAaron/warp-transducer), run `export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh` (**Note**: only `export CUDA_HOME` when you have CUDA)
114+
115+
- For _training_ **Transducer Models** with RNNT Loss in TF, make sure that [warp-transducer](https://github.com/HawkAaron/warp-transducer) **is not installed** (by simply run `pip3 uninstall warprnnt-tensorflow`)
114116

115117
- For _mixed precision training_, use flag `--mxp` when running python scripts from [examples](./examples)
116118

@@ -166,11 +168,17 @@ speech_config: ...
166168
model_config: ...
167169
decoder_config: ...
168170
learning_config:
169-
augmentations: ...
170-
dataset_config:
171-
train_paths: ...
172-
eval_paths: ...
173-
test_paths: ...
171+
train_dataset_config:
172+
augmentation_config: ...
173+
data_paths: ...
174+
tfrecords_dir: ...
175+
eval_dataset_config:
176+
augmentation_config: ...
177+
data_paths: ...
178+
tfrecords_dir: ...
179+
test_dataset_config:
180+
augmentation_config: ...
181+
data_paths: ...
174182
tfrecords_dir: ...
175183
optimizer_config: ...
176184
running_config:

examples/conformer/README.md

Lines changed: 5 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -6,81 +6,7 @@ Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)
66

77
## Example Model YAML Config
88

9-
```yaml
10-
speech_config:
11-
sample_rate: 16000
12-
frame_ms: 25
13-
stride_ms: 10
14-
feature_type: log_mel_spectrogram
15-
num_feature_bins: 80
16-
preemphasis: 0.97
17-
normalize_signal: True
18-
normalize_feature: True
19-
normalize_per_feature: False
20-
21-
decoder_config:
22-
vocabulary: null
23-
target_vocab_size: 1024
24-
max_subword_length: 4
25-
blank_at_zero: True
26-
beam_width: 5
27-
norm_score: True
28-
29-
model_config:
30-
name: conformer
31-
subsampling:
32-
type: conv2
33-
kernel_size: 3
34-
strides: 2
35-
filters: 144
36-
positional_encoding: sinusoid_concat
37-
dmodel: 144
38-
num_blocks: 16
39-
head_size: 36
40-
num_heads: 4
41-
mha_type: relmha
42-
kernel_size: 32
43-
fc_factor: 0.5
44-
dropout: 0.1
45-
embed_dim: 320
46-
embed_dropout: 0.0
47-
num_rnns: 1
48-
rnn_units: 320
49-
rnn_type: lstm
50-
layer_norm: True
51-
joint_dim: 320
52-
53-
learning_config:
54-
augmentations:
55-
after:
56-
time_masking:
57-
num_masks: 10
58-
mask_factor: 100
59-
p_upperbound: 0.2
60-
freq_masking:
61-
num_masks: 1
62-
mask_factor: 27
63-
64-
dataset_config:
65-
train_paths: ...
66-
eval_paths: ...
67-
test_paths: ...
68-
tfrecords_dir: ...
69-
70-
optimizer_config:
71-
warmup_steps: 10000
72-
beta1: 0.9
73-
beta2: 0.98
74-
epsilon: 1e-9
75-
76-
running_config:
77-
batch_size: 4
78-
num_epochs: 22
79-
outdir: ...
80-
log_interval_steps: 400
81-
save_interval_steps: 400
82-
eval_interval_steps: 1000
83-
```
9+
Go to [config.yml](./config.yml)
8410

8511
## Usage
8612

@@ -108,9 +34,10 @@ TFLite Conversion, see `python examples/conformer/tflite_*.py --help`
10834

10935
**Error Rates**
11036

111-
| **Test-clean** | WER (%) | CER (%) |
112-
| :------------: | :-------: | :--------: |
113-
| _Greedy_ | 6.4476862 | 2.51828337 |
37+
| **Test-clean** | WER (%) | CER (%) |
38+
| :------------: | :--------: | :--------: |
39+
| _Greedy_ | 6.37933683 | 2.4757576 |
40+
| _Greedy V2_ | 7.86670732 | 2.82563138 |
11441

11542
| **Test-other** | WER (%) | CER (%) |
11643
| :------------: | :--------: | :--------: |

examples/contextnet/README.md

Lines changed: 1 addition & 213 deletions
Original file line numberDiff line numberDiff line change
@@ -8,219 +8,7 @@ Reference: [http://arxiv.org/abs/2005.03191](http://arxiv.org/abs/2005.03191)
88

99
## Example Model YAML Config
1010

11-
```yaml
12-
speech_config:
13-
sample_rate: 16000
14-
frame_ms: 25
15-
stride_ms: 10
16-
feature_type: log_mel_spectrogram
17-
num_feature_bins: 80
18-
preemphasis: 0.97
19-
normalize_signal: True
20-
normalize_feature: True
21-
normalize_per_feature: False
22-
23-
decoder_config:
24-
vocabulary: null
25-
target_vocab_size: 1024
26-
max_subword_length: 4
27-
blank_at_zero: True
28-
beam_width: 5
29-
norm_score: True
30-
31-
model_config:
32-
name: contextnet
33-
encoder_alpha: 0.5
34-
encoder_blocks:
35-
# C0
36-
- nlayers: 1
37-
kernel_size: 5
38-
filters: 256
39-
strides: 1
40-
residual: False
41-
activation: silu
42-
# C1-C2
43-
- nlayers: 5
44-
kernel_size: 5
45-
filters: 256
46-
strides: 1
47-
residual: True
48-
activation: silu
49-
- nlayers: 5
50-
kernel_size: 5
51-
filters: 256
52-
strides: 1
53-
residual: True
54-
activation: silu
55-
# C3
56-
- nlayers: 5
57-
kernel_size: 5
58-
filters: 256
59-
strides: 2
60-
residual: True
61-
activation: silu
62-
# C4-C6
63-
- nlayers: 5
64-
kernel_size: 5
65-
filters: 256
66-
strides: 1
67-
residual: True
68-
activation: silu
69-
- nlayers: 5
70-
kernel_size: 5
71-
filters: 256
72-
strides: 1
73-
residual: True
74-
activation: silu
75-
- nlayers: 5
76-
kernel_size: 5
77-
filters: 256
78-
strides: 1
79-
residual: True
80-
activation: silu
81-
# C7
82-
- nlayers: 5
83-
kernel_size: 5
84-
filters: 256
85-
strides: 2
86-
residual: True
87-
activation: silu
88-
# C8 - C10
89-
- nlayers: 5
90-
kernel_size: 5
91-
filters: 256
92-
strides: 1
93-
residual: True
94-
activation: silu
95-
- nlayers: 5
96-
kernel_size: 5
97-
filters: 256
98-
strides: 1
99-
residual: True
100-
activation: silu
101-
- nlayers: 5
102-
kernel_size: 5
103-
filters: 256
104-
strides: 1
105-
residual: True
106-
activation: silu
107-
# C11 - C13
108-
- nlayers: 5
109-
kernel_size: 5
110-
filters: 512
111-
strides: 1
112-
residual: True
113-
activation: silu
114-
- nlayers: 5
115-
kernel_size: 5
116-
filters: 512
117-
strides: 1
118-
residual: True
119-
activation: silu
120-
- nlayers: 5
121-
kernel_size: 5
122-
filters: 512
123-
strides: 1
124-
residual: True
125-
activation: silu
126-
# C14
127-
- nlayers: 5
128-
kernel_size: 5
129-
filters: 512
130-
strides: 2
131-
residual: True
132-
activation: silu
133-
# C15 - C21
134-
- nlayers: 5
135-
kernel_size: 5
136-
filters: 512
137-
strides: 1
138-
residual: True
139-
activation: silu
140-
- nlayers: 5
141-
kernel_size: 5
142-
filters: 512
143-
strides: 1
144-
residual: True
145-
activation: silu
146-
- nlayers: 5
147-
kernel_size: 5
148-
filters: 512
149-
strides: 1
150-
residual: True
151-
activation: silu
152-
- nlayers: 5
153-
kernel_size: 5
154-
filters: 512
155-
strides: 1
156-
residual: True
157-
activation: silu
158-
- nlayers: 5
159-
kernel_size: 5
160-
filters: 512
161-
strides: 1
162-
residual: True
163-
activation: silu
164-
- nlayers: 5
165-
kernel_size: 5
166-
filters: 512
167-
strides: 1
168-
residual: True
169-
activation: silu
170-
- nlayers: 5
171-
kernel_size: 5
172-
filters: 512
173-
strides: 1
174-
residual: True
175-
activation: silu
176-
# C22
177-
- nlayers: 1
178-
kernel_size: 5
179-
filters: 640
180-
strides: 1
181-
residual: False
182-
activation: silu
183-
prediction_embed_dim: 640
184-
prediction_embed_dropout: 0
185-
prediction_num_rnns: 1
186-
prediction_rnn_units: 640
187-
prediction_rnn_type: lstm
188-
prediction_rnn_implementation: 1
189-
prediction_layer_norm: True
190-
prediction_projection_units: 0
191-
joint_dim: 640
192-
193-
learning_config:
194-
augmentations:
195-
after:
196-
time_masking:
197-
num_masks: 10
198-
mask_factor: 100
199-
p_upperbound: 0.2
200-
freq_masking:
201-
num_masks: 1
202-
mask_factor: 27
203-
204-
dataset_config:
205-
train_paths: ...
206-
eval_paths: ...
207-
test_paths: ...
208-
tfrecords_dir: ...
209-
210-
optimizer_config:
211-
warmup_steps: 10000
212-
beta1: 0.9
213-
beta2: 0.98
214-
epsilon: 1e-9
215-
216-
running_config:
217-
batch_size: 4
218-
num_epochs: 22
219-
outdir: ...
220-
log_interval_steps: 400
221-
save_interval_steps: 400
222-
eval_interval_steps: 1000
223-
```
11+
Go to [config.yml](./config.yml)
22412

22513
## Usage
22614

examples/deepspeech2/README.md

Lines changed: 3 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,9 @@
22

33
References: [https://arxiv.org/abs/1512.02595](https://arxiv.org/abs/1512.02595)
44

5-
## Model YAML Config Structure
6-
7-
```yaml
8-
model_config:
9-
conv_type: conv2d
10-
conv_kernels: [[11, 41], [11, 21], [11, 11]]
11-
conv_strides: [[2, 2], [1, 2], [1, 2]]
12-
conv_filters: [32, 32, 96]
13-
conv_dropout: 0.1
14-
rnn_nlayers: 5
15-
rnn_type: lstm
16-
rnn_units: 512
17-
rnn_bidirectional: True
18-
rnn_rowconv: 0
19-
rnn_dropout: 0.1
20-
fc_nlayers: 0
21-
fc_units: 1024
22-
```
5+
## Example YAML Config
6+
7+
Go to [config.yml](./config.yml)
238

249
## Architecture
2510

@@ -30,4 +15,3 @@ model_config:
3015
See `python examples/deepspeech2/train_*.py --help`
3116

3217
See `python examples/deepspeech2/test_*.py --help`
33-

0 commit comments

Comments
 (0)