Skip to content

Add SoX as method in pitch_shift #386

@Laubeee

Description

@Laubeee

You point to SoX (and others) in your docs - why not have them implemented with some checks whether SoX (lib) and sox (pip) are installed, which the user would have to do manually to use them. See also https://github.com/marl/pysox

Currently what I do is I copied your PitchShift class and adapted it and then use that. The results are faster and tend to be better, especially for singing (and could be even faster using quick=True).

import random
import numpy as np
from numpy.typing import NDArray

import sox

from audiomentations.core.transforms_interface import BaseWaveformTransform


class SoxPitchShift(BaseWaveformTransform):
    """Pitch shift the sound up or down without changing the tempo"""

    supports_multichannel = True

    def __init__(self, min_semitones: float = -4.0, max_semitones: float = 4.0, p: float = 0.5, quick: bool = False):
        """
        :param min_semitones: Minimum semitones to shift. A negative number means shift down.
        :param max_semitones: Maximum semitones to shift. A positive number means shift up.
        :param p: The probability of applying this transform
        """
        super().__init__(p)
        if min_semitones < -24:
            raise ValueError("min_semitones must be >= -24")
        if max_semitones > 24:
            raise ValueError("max_semitones must be <= 24")
        if min_semitones > max_semitones:
            raise ValueError("min_semitones must not be greater than max_semitones")
        self.min_semitones = min_semitones
        self.max_semitones = max_semitones
        self.quick = quick

    def randomize_parameters(self, samples: NDArray[np.float32], sample_rate: int):
        super().randomize_parameters(samples, sample_rate)
        if self.parameters["should_apply"]:
            self.parameters["num_semitones"] = random.uniform(
                self.min_semitones, self.max_semitones
            )

    def apply(self, samples: NDArray[np.float32], sample_rate: int) -> NDArray[np.float32]:
        tfm = sox.Transformer()
        tfm.pitch(self.parameters["num_semitones"], quick=self.quick)
        return tfm.build_array(input_array=samples, sample_rate_in=sample_rate)

I think it'd be quite easy to implement two additional methods sox and sox_quick with this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions