Skip to content

Commit 4f9b552

Browse files
authored
Add compute_kaldi_pitch to doc (#1260)
1 parent 5efb13e commit 4f9b552

File tree

2 files changed

+24
-17
lines changed

2 files changed

+24
-17
lines changed

docs/source/functional.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -203,3 +203,8 @@ vad
203203
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
204204

205205
.. autofunction:: sliding_window_cmn
206+
207+
:hidden:`compute_kaldi_pitch`
208+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
209+
210+
.. autofunction:: compute_kaldi_pitch

torchaudio/functional/functional.py

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1025,53 +1025,55 @@ def compute_kaldi_pitch(
10251025
sample_rate (float):
10261026
Sample rate of `waveform`.
10271027
frame_length (float, optional):
1028-
Frame length in milliseconds.
1028+
Frame length in milliseconds. (default: 25.0)
10291029
frame_shift (float, optional):
1030-
Frame shift in milliseconds.
1030+
Frame shift in milliseconds. (default: 10.0)
10311031
min_f0 (float, optional):
1032-
Minimum F0 to search for (Hz)
1032+
Minimum F0 to search for (Hz) (default: 50.0)
10331033
max_f0 (float, optional):
1034-
Maximum F0 to search for (Hz)
1034+
Maximum F0 to search for (Hz) (default: 400.0)
10351035
soft_min_f0 (float, optional):
1036-
Minimum f0, applied in soft way, must not exceed min-f0
1036+
Minimum f0, applied in soft way, must not exceed min-f0 (default: 10.0)
10371037
penalty_factor (float, optional):
1038-
Cost factor for FO change.
1038+
Cost factor for FO change. (default: 0.1)
10391039
lowpass_cutoff (float, optional):
1040-
Cutoff frequency for LowPass filter (Hz)
1040+
Cutoff frequency for LowPass filter (Hz) (default: 1000)
10411041
resample_frequency (float, optional):
10421042
Frequency that we down-sample the signal to. Must be more than twice lowpass-cutoff.
1043+
(default: 4000)
10431044
delta_pitch( float, optional):
1044-
Smallest relative change in pitch that our algorithm measures.
1045+
Smallest relative change in pitch that our algorithm measures. (default: 0.005)
10451046
nccf_ballast (float, optional):
1046-
Increasing this factor reduces NCCF for quiet frames
1047+
Increasing this factor reduces NCCF for quiet frames (default: 7000)
10471048
lowpass_filter_width (int, optional):
10481049
Integer that determines filter width of lowpass filter, more gives sharper filter.
1050+
(default: 1)
10491051
upsample_filter_width (int, optional):
1050-
Integer that determines filter width when upsampling NCCF.
1052+
Integer that determines filter width when upsampling NCCF. (default: 5)
10511053
max_frames_latency (int, optional):
10521054
Maximum number of frames of latency that we allow pitch tracking to introduce into
10531055
the feature processing (affects output only if ``frames_per_chunk > 0`` and
1054-
``simulate_first_pass_online=True``)
1056+
``simulate_first_pass_online=True``) (default: 0)
10551057
frames_per_chunk (int, optional):
1056-
The number of frames used for energy normalization.
1058+
The number of frames used for energy normalization. (default: 0)
10571059
simulate_first_pass_online (bool, optional):
10581060
If true, the function will output features that correspond to what an online decoder
10591061
would see in the first pass of decoding -- not the final version of the features,
1060-
which is the default.
1062+
which is the default. (default: False)
10611063
Relevant if ``frames_per_chunk > 0``.
10621064
recompute_frame (int, optional):
10631065
Only relevant for compatibility with online pitch extraction.
10641066
A non-critical parameter; the frame at which we recompute some of the forward pointers,
10651067
after revising our estimate of the signal energy.
1066-
Relevant if ``frames_per_chunk > 0``.
1068+
Relevant if ``frames_per_chunk > 0``. (default: 500)
10671069
snip_edges (bool, optional):
10681070
If this is set to false, the incomplete frames near the ending edge won't be snipped,
10691071
so that the number of frames is the file size divided by the frame-shift.
1070-
This makes different types of features give the same number of frames.
1072+
This makes different types of features give the same number of frames. (default: True)
10711073
10721074
Returns:
1073-
Tensor: Pitch feature. Shape: `(batch, frames 2)` where the last dimension
1074-
corresponds to pitch and NCCF.
1075+
Tensor: Pitch feature. Shape: ``(batch, frames 2)`` where the last dimension
1076+
corresponds to pitch and NCCF.
10751077
10761078
Reference:
10771079
- A pitch extraction algorithm tuned for automatic speech recognition

0 commit comments

Comments
 (0)