Speed Perturbation #28

julianmack · 2020-01-30T16:21:20Z

Adds speed perturbation.

This introduces non-trivial changes to the pre-processing pipeline so it's worth giving a bit of background on why I've taken this approach. Some of this is a repetition of an earlier slack message.

TL;DR: it is necessary to use torchaudio.sox_effect_chain which adds a large amount of complexity.

'Simpler' alternatives to using `sox`

I've tried x2 multiple other ways of performing speed-perturbation:

Using another library librosa. NVIDIA use this (https://github.com/ryanleary/mlperf-rnnt-ref/blob/fe0cc4145c240d4f8a8fe1814f397df63095e220/parts/perturb.py#L42)
Using torchaudio.Resample to perform directly on the input tensor.

Both of these are very slow. 1. converts to the frequency domain and back while 2. is even slower when upsampling the signal (i.e. making it slower). For reference on copernicus the methods are respectively x20 and x300 slower (!) than the sox implementation and the dataloaders become the limiting factors during training.
For comparison, the sox version does add some overhead but this is acceptable (+25% time per epoch - and this includes the fact that some sequences are 15% longer.)

A third potential method (which NVIDIA also use: https://github.com/ryanleary/mlperf-rnnt-ref/blob/fe0cc4145c240d4f8a8fe1814f397df63095e220/utils/preprocessing_utils.py#L52) is performing the perturbation offline. This seems like a poor choice to me since:
a) Each training sample has a fixed speed change - reducing augmentation effectiveness
b) This isn't scalable with training set size (to 60k/100k hrs) as the multiple dataset copies won't fit on the disk of a single machine.

Necessary changes

The complexity added by sox_effects_chain is that it must be applied on a filepath rather than a tensor. To deal with this I've split the audio transforms into two types:

pre_load_transforms - speed_perturbation is of this type
post_load_transforms - all previous transforms are of this type

FYI, the high-level API treats speed_pertubation in exactly the same way as other steps but I've found it necessary for the builders + dataset to have knowledge of the two transform types.

It is also necessary to add a worker_init_fn to avoid seg_faults when sox is being used 😱 - I think the lack of this fn lead samG to think that sox wasn't thread-safe.

julianmack added 3 commits January 30, 2020 15:30

Added speed pertubation and made tests pass

cccadb1

Updated SpeedPerturbation docstrings

0fd4451

Updated docs

2e034ea

auto-assign bot requested a review from samgd January 30, 2020 16:21

julianmack added 3 commits January 30, 2020 16:22

Removed whitespace

4b99f1f

Fixed bug for no-eval case

c69dca0

Updated librispeech to prevent calling sox_transforms twice per sample

9b6c66d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed Perturbation #28

Speed Perturbation #28

Uh oh!

julianmack commented Jan 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Speed Perturbation #28

Are you sure you want to change the base?

Speed Perturbation #28

Uh oh!

Conversation

julianmack commented Jan 30, 2020

'Simpler' alternatives to using sox

Necessary changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

'Simpler' alternatives to using `sox`