Skip to content

Commit 21ae411

Browse files
authored
Merge pull request #4847 from sendream/master
Add recipe of Tibetan Amdo dialect
2 parents 545b1f1 + cbc7284 commit 21ae411

28 files changed

+1418
-0
lines changed

egs/xbmu_amdo31/README.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
About the XBMU-AMDO31 corpus XBMU-AMDO31 is an open-source Amdo Tibetan speech corpus published by Northwest Minzu University.
2+
3+
XBMU-AMDO31 dataset is a speech recognition corpus of Tibetan Amdo dialect. The open source corpus contains 31 hours of speech data and resources related to build speech recognition systems,including transcribed texts and a Tibetan pronunciation lexicon. (The lexicon is a Tibetan lexicon of the Lhasa dialect, which has been reused for the Amdo dialect because of the uniformity of the Tibetan language) The dataset can be used to train a model for Amdo Tibetan Automatic Speech Recognition (ASR).
4+
5+
The database can be downloaded from openslr:
6+
http://www.openslr.org/133/
7+
8+
For more details, please visit:
9+
https://huggingface.co/datasets/syzym/xbmu_amdo31
10+
11+
This recipe includes some different ASR models trained with XBMU-AMDO31.

egs/xbmu_amdo31/s5/RESULTS

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
%WER 46.16 [ 15522 / 33628, 380 ins, 2208 del, 12934 sub ] exp/mono/decode_test/wer_10_0.0
2+
%WER 24.60 [ 8274 / 33628, 330 ins, 860 del, 7084 sub ] exp/tri1/decode_test/wer_13_0.0
3+
%WER 24.42 [ 8213 / 33628, 323 ins, 847 del, 7043 sub ] exp/tri2/decode_test/wer_13_0.0
4+
%WER 22.93 [ 7712 / 33628, 336 ins, 814 del, 6562 sub ] exp/tri3a/decode_test/wer_12_0.0
5+
%WER 20.17 [ 6783 / 33628, 275 ins, 764 del, 5744 sub ] exp/tri4a/decode_test/wer_15_0.0
6+
%WER 19.03 [ 6400 / 33628, 292 ins, 667 del, 5441 sub ] exp/tri5a/decode_test/wer_14_0.0
7+
%WER 15.45 [ 5196 / 33628, 229 ins, 646 del, 4321 sub ] exp/nnet3/tdnn_sp/decode_test/wer_16_0.0
8+
%WER 15.57 [ 5235 / 33628, 244 ins, 575 del, 4416 sub ] exp/chain/tdnn_1a_sp/decode_test/wer_11_0.0

egs/xbmu_amdo31/s5/cmd.sh

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# you can change cmd.sh depending on what type of queue you are using.
2+
# If you have no queueing system and want to run on a local machine, you
3+
# can change all instances 'queue.pl' to run.pl (but be careful and run
4+
# commands one by one: most recipes will exhaust the memory on your
5+
# machine). queue.pl works with GridEngine (qsub). slurm.pl works
6+
# with slurm. Different queues are configured differently, with different
7+
# queue names and different ways of specifying things like memory;
8+
# to account for these differences you can create and edit the file
9+
# conf/queue.conf to match your queue's configuration. Search for
10+
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
11+
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.
12+
13+
export train_cmd="queue.pl --mem 2G"
14+
export decode_cmd="queue.pl --mem 4G"
15+
export mkgraph_cmd="queue.pl --mem 8G"
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
beam=11.0 # beam for decoding. Was 13.0 in the scripts.
2+
first_beam=8.0 # beam for 1st-pass decoding in SAT.
3+
4+
5+

egs/xbmu_amdo31/s5/conf/mfcc.conf

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
--use-energy=false # only non-default option.
2+
--sample-frequency=16000
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# config for high-resolution MFCC features, intended for neural network training.
2+
# Note: we keep all cepstra, so it has the same info as filterbank features,
3+
# but MFCC is more easily compressible (because less correlated) which is why
4+
# we prefer this method.
5+
--use-energy=false # use average of log energy, not energy.
6+
--sample-frequency=16000 # Switchboard is sampled at 8kHz
7+
--num-mel-bins=40 # similar to Google's setup.
8+
--num-ceps=40 # there is no dimensionality reduction.
9+
--low-freq=40 # low cutoff frequency for mel bins
10+
--high-freq=-200 # high cutoff frequently, relative to Nyquist of 8000 (=3800)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# configuration file for apply-cmvn-online, used when invoking online2-wav-nnet3-latgen-faster.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
--sample-frequency=16000
2+
--simulate-first-pass-online=true
3+
--normalization-right-context=25
4+
--frames-per-chunk=10

egs/xbmu_amdo31/s5/conf/pitch.conf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
--sample-frequency=16000
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
tuning/run_tdnn_1a.sh

0 commit comments

Comments
 (0)