Skip to content

Commit e08e208

Browse files
committed
⚡ Update line endings and add more option for conformer
1 parent b42d6d4 commit e08e208

File tree

18 files changed

+1448
-1445
lines changed

18 files changed

+1448
-1445
lines changed

examples/conformer/README.md

Lines changed: 113 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,113 @@
1-
# Conformer: Convolution-augmented Transformer for Speech Recognition
2-
3-
Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)
4-
5-
![Conformer Architecture](./figs/arch.png)
6-
7-
## Example Model YAML Config
8-
9-
```yaml
10-
speech_config:
11-
sample_rate: 16000
12-
frame_ms: 25
13-
stride_ms: 10
14-
feature_type: log_mel_spectrogram
15-
num_feature_bins: 80
16-
preemphasis: 0.97
17-
normalize_signal: True
18-
normalize_feature: True
19-
normalize_per_feature: False
20-
21-
decoder_config:
22-
vocabulary: null
23-
target_vocab_size: 1024
24-
max_subword_length: 4
25-
blank_at_zero: True
26-
beam_width: 5
27-
norm_score: True
28-
29-
model_config:
30-
name: conformer
31-
subsampling:
32-
type: conv2
33-
kernel_size: 3
34-
strides: 2
35-
filters: 144
36-
positional_encoding: sinusoid_concat
37-
dmodel: 144
38-
num_blocks: 16
39-
head_size: 36
40-
num_heads: 4
41-
mha_type: relmha
42-
kernel_size: 32
43-
fc_factor: 0.5
44-
dropout: 0.1
45-
embed_dim: 320
46-
embed_dropout: 0.0
47-
num_rnns: 1
48-
rnn_units: 320
49-
rnn_type: lstm
50-
layer_norm: True
51-
joint_dim: 320
52-
53-
learning_config:
54-
augmentations:
55-
after:
56-
time_masking:
57-
num_masks: 10
58-
mask_factor: 100
59-
p_upperbound: 0.2
60-
freq_masking:
61-
num_masks: 1
62-
mask_factor: 27
63-
64-
dataset_config:
65-
train_paths: ...
66-
eval_paths: ...
67-
test_paths: ...
68-
tfrecords_dir: ...
69-
70-
optimizer_config:
71-
warmup_steps: 10000
72-
beta1: 0.9
73-
beta2: 0.98
74-
epsilon: 1e-9
75-
76-
running_config:
77-
batch_size: 4
78-
num_epochs: 22
79-
outdir: ...
80-
log_interval_steps: 400
81-
save_interval_steps: 400
82-
eval_interval_steps: 1000
83-
```
84-
85-
## Usage
86-
87-
Training, see `python examples/conformer/train_conformer.py --help`
88-
89-
Testing, see `python examples/conformer/train_conformer.py --help`
90-
91-
TFLite Conversion, see `python examples/conformer/tflite_conformer.py --help`
92-
93-
## Conformer Subwords - Results on LibriSpeech
94-
95-
**Summary**
96-
97-
- Number of subwords: 1031
98-
- Maxium length of a subword: 4
99-
- Subwords corpus: all training sets, dev sets and test-clean
100-
- Number of parameters: 10,341,639
101-
- Positional Encoding Type: sinusoid concatenation
102-
103-
**Pretrained and Config**, go to [drive](https://drive.google.com/drive/folders/1VAihgSB5vGXwIVTl3hkUk95joxY1YbfW?usp=sharing)
104-
105-
**Transducer Loss**
106-
107-
<img src="./figs/subword_conformer_loss.svg" alt="conformer_subword" width="300px" />
108-
109-
**Error Rates**
110-
111-
| Test-clean | WER (%) | CER (%) |
112-
| :--------: | :-------: | :--------: |
113-
| _Greedy_ | 6.4476862 | 2.51828337 |
1+
# Conformer: Convolution-augmented Transformer for Speech Recognition
2+
3+
Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100)
4+
5+
![Conformer Architecture](./figs/arch.png)
6+
7+
## Example Model YAML Config
8+
9+
```yaml
10+
speech_config:
11+
sample_rate: 16000
12+
frame_ms: 25
13+
stride_ms: 10
14+
feature_type: log_mel_spectrogram
15+
num_feature_bins: 80
16+
preemphasis: 0.97
17+
normalize_signal: True
18+
normalize_feature: True
19+
normalize_per_feature: False
20+
21+
decoder_config:
22+
vocabulary: null
23+
target_vocab_size: 1024
24+
max_subword_length: 4
25+
blank_at_zero: True
26+
beam_width: 5
27+
norm_score: True
28+
29+
model_config:
30+
name: conformer
31+
subsampling:
32+
type: conv2
33+
kernel_size: 3
34+
strides: 2
35+
filters: 144
36+
positional_encoding: sinusoid_concat
37+
dmodel: 144
38+
num_blocks: 16
39+
head_size: 36
40+
num_heads: 4
41+
mha_type: relmha
42+
kernel_size: 32
43+
fc_factor: 0.5
44+
dropout: 0.1
45+
embed_dim: 320
46+
embed_dropout: 0.0
47+
num_rnns: 1
48+
rnn_units: 320
49+
rnn_type: lstm
50+
layer_norm: True
51+
joint_dim: 320
52+
53+
learning_config:
54+
augmentations:
55+
after:
56+
time_masking:
57+
num_masks: 10
58+
mask_factor: 100
59+
p_upperbound: 0.2
60+
freq_masking:
61+
num_masks: 1
62+
mask_factor: 27
63+
64+
dataset_config:
65+
train_paths: ...
66+
eval_paths: ...
67+
test_paths: ...
68+
tfrecords_dir: ...
69+
70+
optimizer_config:
71+
warmup_steps: 10000
72+
beta1: 0.9
73+
beta2: 0.98
74+
epsilon: 1e-9
75+
76+
running_config:
77+
batch_size: 4
78+
num_epochs: 22
79+
outdir: ...
80+
log_interval_steps: 400
81+
save_interval_steps: 400
82+
eval_interval_steps: 1000
83+
```
84+
85+
## Usage
86+
87+
Training, see `python examples/conformer/train_conformer.py --help`
88+
89+
Testing, see `python examples/conformer/train_conformer.py --help`
90+
91+
TFLite Conversion, see `python examples/conformer/tflite_conformer.py --help`
92+
93+
## Conformer Subwords - Results on LibriSpeech
94+
95+
**Summary**
96+
97+
- Number of subwords: 1031
98+
- Maxium length of a subword: 4
99+
- Subwords corpus: all training sets, dev sets and test-clean
100+
- Number of parameters: 10,341,639
101+
- Positional Encoding Type: sinusoid concatenation
102+
103+
**Pretrained and Config**, go to [drive](https://drive.google.com/drive/folders/1VAihgSB5vGXwIVTl3hkUk95joxY1YbfW?usp=sharing)
104+
105+
**Transducer Loss**
106+
107+
<img src="./figs/subword_conformer_loss.svg" alt="conformer_subword" width="300px" />
108+
109+
**Error Rates**
110+
111+
| Test-clean | WER (%) | CER (%) |
112+
| :--------: | :-------: | :--------: |
113+
| _Greedy_ | 6.4476862 | 2.51828337 |

0 commit comments

Comments
 (0)