You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add support for Parakeet CTC
* Add parakeet to list of supported models
* Skip special tokens when decoding CTC output from ASR pipeline
* Trim decoded output
* Update tests
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -404,6 +404,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
404
404
1.**[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://huggingface.co/papers/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
405
405
1.**[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://huggingface.co/papers/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
406
406
1.**[PaliGemma](https://huggingface.co/docs/transformers/main/model_doc/paligemma)** (from Google) released with the papers [PaliGemma: A versatile 3B VLM for transfer](https://huggingface.co/papers/2407.07726) and [PaliGemma 2: A Family of Versatile VLMs for Transfer](https://huggingface.co/papers/2412.03555) by the PaliGemma Google team.
407
+
1.**[Parakeet](https://huggingface.co/docs/transformers/main/model_doc/parakeet)** (from NVIDIA) released with the blog post [Introducing the Parakeet ASR family](https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/) by the NVIDIA NeMo team.
407
408
1.**[PatchTSMixer](https://huggingface.co/docs/transformers/main/model_doc/patchtsmixer)** (from IBM) released with the paper [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://huggingface.co/papers/2306.09364) by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.
408
409
1.**[PatchTST](https://huggingface.co/docs/transformers/main/model_doc/patchtst)** (from Princeton University, IBM) released with the paper [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://huggingface.co/papers/2211.14730) by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.
409
410
1.**[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://huggingface.co/papers/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://huggingface.co/papers/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
Copy file name to clipboardExpand all lines: docs/snippets/6_supported-models.snippet
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -118,6 +118,7 @@
118
118
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://huggingface.co/papers/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
119
119
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://huggingface.co/papers/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
120
120
1. **[PaliGemma](https://huggingface.co/docs/transformers/main/model_doc/paligemma)** (from Google) released with the papers [PaliGemma: A versatile 3B VLM for transfer](https://huggingface.co/papers/2407.07726) and [PaliGemma 2: A Family of Versatile VLMs for Transfer](https://huggingface.co/papers/2412.03555) by the PaliGemma Google team.
121
+
1. **[Parakeet](https://huggingface.co/docs/transformers/main/model_doc/parakeet)** (from NVIDIA) released with the blog post [Introducing the Parakeet ASR family](https://developer.nvidia.com/blog/pushing-the-boundaries-of-speech-recognition-with-nemo-parakeet-asr-models/) by the NVIDIA NeMo team.
121
122
1. **[PatchTSMixer](https://huggingface.co/docs/transformers/main/model_doc/patchtsmixer)** (from IBM) released with the paper [TSMixer: Lightweight MLP-Mixer Model for Multivariate Time Series Forecasting](https://huggingface.co/papers/2306.09364) by Vijay Ekambaram, Arindam Jati, Nam Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.
122
123
1. **[PatchTST](https://huggingface.co/docs/transformers/main/model_doc/patchtst)** (from Princeton University, IBM) released with the paper [A Time Series is Worth 64 Words: Long-term Forecasting with Transformers](https://huggingface.co/papers/2211.14730) by Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam.
123
124
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://huggingface.co/papers/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://huggingface.co/papers/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
* Computes the log-Mel spectrogram of the provided audio waveform.
34
+
* @param {Float32Array|Float64Array} waveform The audio waveform to process.
35
+
* @returns {Promise<Tensor>} An object containing the log-Mel spectrogram data as a Float32Array and its dimensions as an array of numbers.
36
+
*/
37
+
async_extract_fbank_features(waveform){
38
+
// Parakeet uses a custom preemphasis strategy: Apply preemphasis to entire waveform at once
39
+
constpreemphasis=this.config.preemphasis;
40
+
waveform=newFloat64Array(waveform);// Clone to avoid destructive changes
41
+
for(letj=waveform.length-1;j>=1;--j){
42
+
waveform[j]-=preemphasis*waveform[j-1];
43
+
}
44
+
45
+
constfeatures=awaitspectrogram(
46
+
waveform,
47
+
this.window,// window
48
+
this.window.length,// frame_length
49
+
this.config.hop_length,// hop_length
50
+
{
51
+
fft_length: this.config.n_fft,
52
+
power: 2.0,
53
+
mel_filters: this.config.mel_filters,
54
+
log_mel: 'log',
55
+
mel_floor: -Infinity,
56
+
pad_mode: 'constant',
57
+
center: true,
58
+
59
+
// Custom
60
+
transpose: true,
61
+
mel_offset: 2**-24,
62
+
}
63
+
)
64
+
65
+
returnfeatures;
66
+
}
67
+
68
+
/**
69
+
* Asynchronously extracts features from a given audio using the provided configuration.
70
+
* @param {Float32Array|Float64Array} audio The audio data as a Float32Array/Float64Array.
71
+
* @returns {Promise<{ input_features: Tensor; attention_mask: Tensor; }>} A Promise resolving to an object containing the extracted input features as a Tensor.
* @param {number} [options.min_num_frames=null] If provided, ensures the number of frames to compute is at least this value.
471
471
* @param {boolean} [options.do_pad=true] If `true`, pads the output spectrogram to have `max_num_frames` frames.
472
472
* @param {boolean} [options.transpose=false] If `true`, the returned spectrogram will have shape `(num_frames, num_frequency_bins/num_mel_filters)`. If `false`, the returned spectrogram will have shape `(num_frequency_bins/num_mel_filters, num_frames)`.
473
+
* @param {number} [options.mel_offset=0] Offset to add to the mel spectrogram to avoid taking the log of zero.
473
474
* @returns {Promise<Tensor>} Spectrogram of shape `(num_frequency_bins, length)` (regular spectrogram) or shape `(num_mel_filters, length)` (mel spectrogram).
474
475
*/
475
476
exportasyncfunctionspectrogram(
@@ -498,6 +499,7 @@ export async function spectrogram(
498
499
max_num_frames =null,
499
500
do_pad =true,
500
501
transpose =false,
502
+
mel_offset =0,
501
503
}={}
502
504
){
503
505
constwindow_length=window.length;
@@ -530,11 +532,23 @@ export async function spectrogram(
530
532
}
531
533
532
534
if(center){
533
-
if(pad_mode!=='reflect'){
534
-
thrownewError(`pad_mode="${pad_mode}" not implemented yet.`)
0 commit comments