Skip to content

Commit f56003f

Browse files
committed
More details for TRAINING
1 parent 5944647 commit f56003f

File tree

1 file changed

+18
-5
lines changed

1 file changed

+18
-5
lines changed

README

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,28 +26,41 @@ While it is meant to be used as a library, a simple command-line tool is
2626
provided as an example. It operates on RAW 16-bit (machine endian) mono
2727
PCM files sampled at 48 kHz. It can be used as:
2828

29-
./examples/rnnoise_demo <noisy speech> <output denoised>
29+
% ./examples/rnnoise_demo <noisy speech> <output denoised>
3030

3131
The output is also a 16-bit raw PCM file.
3232
NOTE AGAIN, THE INPUT and OUTPUT ARE IN RAW FORMAT, NOT WAV.
3333

3434
The latest version of the source is available from
35-
https://gitlab.xiph.org/xiph/rnnoise . The github repository
35+
https://gitlab.xiph.org/xiph/rnnoise . The GitHub repository
3636
is a convenience copy.
3737

3838
== TRAINING ==
3939

4040
To train an RNNoise model, you need both clean speech data, and noise data.
4141
Both need to be sampled at 48 kHz, in 16-bit PCM format (machine endian).
42+
Clean speech data can be obtained from https://media.xiph.org/rnnoise/data/tts_speech_48k.sw
43+
The first step is to take the speech and noise, and mix them in a variety of ways
44+
to simulate real life conditions (including pauses, filtering and more).
4245
Assuming the files are called speech.pcm and noise.pcm, start by generating
43-
the training data with
46+
the training feature data with:
4447

4548
% ./dump_features speech.pcm noise.pcm features.f32 <count>
4649
where <count> is the number of sequences to process. The number of sequences
47-
should be at least 10000, but the more the better.
50+
should be at least 10000, but the more the better (200000 or more is recommended).
51+
52+
Optionally, training can also simulate reverberation, in which case room impulse
53+
responses (RIR) are also needed. Limited RIR data is available at:
54+
https://media.xiph.org/rnnoise/data/measured_rirs-v2.tar.gz
55+
The format for those is raw 32-bit floating-point (files are little endian).
56+
Assuming a list of all the RIR files is contained in a rir_list.txt file,
57+
the training feature data can be generated with:
58+
59+
% ./dump_features -rir_list rir_list.txt speech.pcm noise.pcm features.f32 <count>
4860

4961
To make the feature generation faster, you can use the script provided in
50-
script/dump_features_parallel.sh
62+
script/dump_features_parallel.sh (you will need to modify the script if you
63+
want to add RIR augmentation).
5164

5265
To use it:
5366
% script/dump_features_parallel.sh ./dump_features speech.pcm noise.pcm features.f32 <count> <nb_processes>

0 commit comments

Comments
 (0)