Skip to content

Commit e71902b

Browse files
committed
Add support for DAC
1 parent b452e0f commit e71902b

File tree

5 files changed

+84
-1
lines changed

5 files changed

+84
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
305305
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
306306
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
307307
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
308+
1. **[DAC](https://huggingface.co/docs/transformers/model_doc/dac)** (from Descript) released with the paper [Descript Audio Codec: High-Fidelity Audio Compression with Improved RVQGAN](https://arxiv.org/abs/2306.06546) by Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar.
308309
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
309310
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
310311
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
@@ -359,7 +360,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
359360
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
360361
1. **[MusicGen](https://huggingface.co/docs/transformers/model_doc/musicgen)** (from Meta) released with the paper [Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284) by Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi and Alexandre Défossez.
361362
1. **[MGP-STR](https://huggingface.co/docs/transformers/model_doc/mgp-str)** (from Alibaba Research) released with the paper [Multi-Granularity Prediction for Scene Text Recognition](https://arxiv.org/abs/2209.03592) by Peng Wang, Cheng Da, and Cong Yao.
362-
1. **[Mimi](https://huggingface.co/docs/transformers/model_doc/mimi)** (from Kyutai) released with the paper [Moshi: a speech-text foundation model for real-time dialogue](https://kyutai.org/Moshi.pdf) by Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour.
363+
1. **[Mimi](https://huggingface.co/docs/transformers/model_doc/mimi)** (from Kyutai) released with the paper [Moshi: a speech-text foundation model for real-time dialogue](https://arxiv.org/abs/2410.00037) by Alexandre Défossez, Laurent Mazaré, Manu Orsini, Amélie Royer, Patrick Pérez, Hervé Jégou, Edouard Grave and Neil Zeghidour.
363364
1. **[Mistral](https://huggingface.co/docs/transformers/model_doc/mistral)** (from Mistral AI) by The [Mistral AI](https://mistral.ai) team: Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.
364365
1. **[MMS](https://huggingface.co/docs/transformers/model_doc/mms)** (from Facebook) released with the paper [Scaling Speech Technology to 1,000+ Languages](https://arxiv.org/abs/2305.13516) by Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli.
365366
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.

docs/snippets/6_supported-models.snippet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
2121
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
2222
1. **[ConvNeXTV2](https://huggingface.co/docs/transformers/model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808) by Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie.
23+
1. **[DAC](https://huggingface.co/docs/transformers/model_doc/dac)** (from Descript) released with the paper [Descript Audio Codec: High-Fidelity Audio Compression with Improved RVQGAN](https://arxiv.org/abs/2306.06546) by Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar.
2324
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
2425
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
2526
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.

src/models.js

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7190,6 +7190,82 @@ export class MimiDecoderModel extends MimiPreTrainedModel {
71907190
}
71917191
//////////////////////////////////////////////////
71927192

7193+
7194+
//////////////////////////////////////////////////
7195+
// Dac models
7196+
export class DacPreTrainedModel extends PreTrainedModel {
7197+
main_input_name = 'input_values';
7198+
forward_params = ['input_values'];
7199+
}
7200+
7201+
export class DacEncoderOutput extends ModelOutput {
7202+
/**
7203+
* @param {Object} output The output of the model.
7204+
* @param {Tensor} output.audio_codes Discrete code embeddings, of shape `(batch_size, num_quantizers, codes_length)`.
7205+
*/
7206+
constructor({ audio_codes }) {
7207+
super();
7208+
this.audio_codes = audio_codes;
7209+
}
7210+
}
7211+
7212+
export class DacDecoderOutput extends ModelOutput {
7213+
/**
7214+
* @param {Object} output The output of the model.
7215+
* @param {Tensor} output.audio_values Decoded audio values, of shape `(batch_size, num_channels, sequence_length)`.
7216+
*/
7217+
constructor({ audio_values }) {
7218+
super();
7219+
this.audio_values = audio_values;
7220+
}
7221+
}
7222+
7223+
/**
7224+
* The DAC (Descript Audio Codec) model.
7225+
*/
7226+
export class DacModel extends DacPreTrainedModel {
7227+
/**
7228+
* Encodes the input audio waveform into discrete codes.
7229+
* @param {Object} inputs Model inputs
7230+
* @param {Tensor} [inputs.input_values] Float values of the input audio waveform, of shape `(batch_size, channels, sequence_length)`).
7231+
* @returns {Promise<DacEncoderOutput>} The output tensor of shape `(batch_size, num_codebooks, sequence_length)`.
7232+
*/
7233+
async encode(inputs) {
7234+
return new DacEncoderOutput(await sessionRun(this.sessions['encoder_model'], inputs));
7235+
}
7236+
7237+
/**
7238+
* Decodes the given frames into an output audio waveform.
7239+
* @param {DacEncoderOutput} inputs The encoded audio codes.
7240+
* @returns {Promise<DacDecoderOutput>} The output tensor of shape `(batch_size, num_channels, sequence_length)`.
7241+
*/
7242+
async decode(inputs) {
7243+
return new DacDecoderOutput(await sessionRun(this.sessions['decoder_model'], inputs));
7244+
}
7245+
}
7246+
7247+
export class DacEncoderModel extends DacPreTrainedModel {
7248+
/** @type {typeof PreTrainedModel.from_pretrained} */
7249+
static async from_pretrained(pretrained_model_name_or_path, options = {}) {
7250+
return super.from_pretrained(pretrained_model_name_or_path, {
7251+
...options,
7252+
// Update default model file name if not provided
7253+
model_file_name: options.model_file_name ?? 'encoder_model',
7254+
});
7255+
}
7256+
}
7257+
export class DacDecoderModel extends DacPreTrainedModel {
7258+
/** @type {typeof PreTrainedModel.from_pretrained} */
7259+
static async from_pretrained(pretrained_model_name_or_path, options = {}) {
7260+
return super.from_pretrained(pretrained_model_name_or_path, {
7261+
...options,
7262+
// Update default model file name if not provided
7263+
model_file_name: options.model_file_name ?? 'decoder_model',
7264+
});
7265+
}
7266+
}
7267+
//////////////////////////////////////////////////
7268+
71937269
//////////////////////////////////////////////////
71947270
// AutoModels, used to simplify construction of PreTrainedModels
71957271
// (uses config to instantiate correct class)
@@ -7361,6 +7437,7 @@ const MODEL_MAPPING_NAMES_ENCODER_DECODER = new Map([
73617437

73627438
const MODEL_MAPPING_NAMES_AUTO_ENCODER = new Map([
73637439
['mimi', ['MimiModel', MimiModel]],
7440+
['dac', ['DacModel', DacModel]],
73647441
]);
73657442

73667443
const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
import { EncodecFeatureExtractor } from '../encodec/feature_extraction_encodec.js';
2+
3+
export class DacFeatureExtractor extends EncodecFeatureExtractor { }

src/models/feature_extractors.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
export * from './audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.js';
33
export * from './encodec/feature_extraction_encodec.js';
44
export * from './clap/feature_extraction_clap.js';
5+
export * from './dac/feature_extraction_dac.js';
56
export * from './moonshine/feature_extraction_moonshine.js';
67
export * from './pyannote/feature_extraction_pyannote.js';
78
export * from './seamless_m4t/feature_extraction_seamless_m4t.js';

0 commit comments

Comments
 (0)