@@ -26,28 +26,41 @@ While it is meant to be used as a library, a simple command-line tool is
2626provided as an example. It operates on RAW 16-bit (machine endian) mono
2727PCM files sampled at 48 kHz. It can be used as:
2828
29- ./examples/rnnoise_demo <noisy speech> <output denoised>
29+ % ./examples/rnnoise_demo <noisy speech> <output denoised>
3030
3131The output is also a 16-bit raw PCM file.
3232NOTE AGAIN, THE INPUT and OUTPUT ARE IN RAW FORMAT, NOT WAV.
3333
3434The latest version of the source is available from
35- https://gitlab.xiph.org/xiph/rnnoise . The github repository
35+ https://gitlab.xiph.org/xiph/rnnoise . The GitHub repository
3636is a convenience copy.
3737
3838== TRAINING ==
3939
4040To train an RNNoise model, you need both clean speech data, and noise data.
4141Both need to be sampled at 48 kHz, in 16-bit PCM format (machine endian).
42+ Clean speech data can be obtained from https://media.xiph.org/rnnoise/data/tts_speech_48k.sw
43+ The first step is to take the speech and noise, and mix them in a variety of ways
44+ to simulate real life conditions (including pauses, filtering and more).
4245Assuming the files are called speech.pcm and noise.pcm, start by generating
43- the training data with
46+ the training feature data with:
4447
4548% ./dump_features speech.pcm noise.pcm features.f32 <count>
4649where <count> is the number of sequences to process. The number of sequences
47- should be at least 10000, but the more the better.
50+ should be at least 10000, but the more the better (200000 or more is recommended).
51+
52+ Optionally, training can also simulate reverberation, in which case room impulse
53+ responses (RIR) are also needed. Limited RIR data is available at:
54+ https://media.xiph.org/rnnoise/data/measured_rirs-v2.tar.gz
55+ The format for those is raw 32-bit floating-point (files are little endian).
56+ Assuming a list of all the RIR files is contained in a rir_list.txt file,
57+ the training feature data can be generated with:
58+
59+ % ./dump_features -rir_list rir_list.txt speech.pcm noise.pcm features.f32 <count>
4860
4961To make the feature generation faster, you can use the script provided in
50- script/dump_features_parallel.sh
62+ script/dump_features_parallel.sh (you will need to modify the script if you
63+ want to add RIR augmentation).
5164
5265To use it:
5366% script/dump_features_parallel.sh ./dump_features speech.pcm noise.pcm features.f32 <count> <nb_processes>
0 commit comments