Skip to content

Releases: huggingface/transformers.js

3.8.0

19 Nov 16:50
bf09aaf

Choose a tag to compare

🚀 Transformers.js v3.8 — SAM2, SAM3, EdgeTAM, Supertonic TTS

  • Add support for EdgeTAM in #1454

  • Add support for Supertonic TTS in #1459

    Example:

    import { pipeline } from '@huggingface/transformers';
    
    const tts = await pipeline('text-to-speech', 'onnx-community/Supertonic-TTS-ONNX');
    
    const input_text = 'This is really cool!';
    const audio = await tts(input_text, {
        speaker_embeddings: 'https://huggingface.co/onnx-community/Supertonic-TTS-ONNX/resolve/main/voices/F1.bin',
    });
    await audio.save('output.wav');
  • Add support for SAM2 and SAM3 (Tracker) in #1461

  • Remove Metaspace add_prefix_space logic in #1451

  • ImageProcessor preprocess uses image_std for fill value by @NathanKolbas in #1455

New Contributors

Full Changelog: 3.7.6...3.8.0

3.7.6

20 Oct 19:44
4c908ec

Choose a tag to compare

What's new?

New Contributors

Full Changelog: 3.7.5...3.7.6

3.7.5

02 Oct 13:58
c670bb9

Choose a tag to compare

What's new?

  • Add support for GraniteMoeHybrid in #1426

Full Changelog: 3.7.4...3.7.5

3.7.4

29 Sep 17:40
d6b3998

Choose a tag to compare

What's new?

  • Correctly assign logits warpers in _get_logits_processor in #1422

Full Changelog: 3.7.3...3.7.4

3.7.3

12 Sep 20:35
699dcb5

Choose a tag to compare

What's new?

  • Unify inference chains in #1399
  • Fix progress tracking bug by @kukudixiaoming in #1405
  • Add support for MobileLLM-R1 (llama4_text) in #1412
  • Add support for VaultGemma in #1413

New Contributors

Full Changelog: 3.7.2...3.7.3

3.7.2

15 Aug 17:58
28852a2

Choose a tag to compare

What's new?

  • Add support for DINOv3 in #1390

    See here for the full list of supported models.

    Example: Compute image embeddings

    import { pipeline } from '@huggingface/transformers';
    
    const image_feature_extractor = await pipeline(
        'image-feature-extraction',
        'onnx-community/dinov3-vits16-pretrain-lvd1689m-ONNX',
    );
    const url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png';
    const features = await image_feature_extractor(url);
    console.log(features);

    Try it out using our online demo:

    dinov3.mp4

Full Changelog: 3.7.1...3.7.2

3.7.1

01 Aug 21:14
8d6c400

Choose a tag to compare

What's new?

New Contributors

Full Changelog: 3.7.0...3.7.1

3.7.0

23 Jul 03:12
0feb5b7

Choose a tag to compare

🚀 Transformers.js v3.7 — Voxtral, LFM2, ModernBERT Decoder

🤖 New models

This update adds support for 3 new architectures:

Voxtral

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. ONNX weights for Voxtral-Mini-3B-2507 can be found here. Learn more about Voxtral in the release blog post.

Try it out with our online demo:

Voxtral.WebGPU.demo.mp4

Example: Audio transcription

import { VoxtralForConditionalGeneration, VoxtralProcessor, TextStreamer, read_audio } from "@huggingface/transformers";

// Load the processor and model
const model_id = "onnx-community/Voxtral-Mini-3B-2507-ONNX";
const processor = await VoxtralProcessor.from_pretrained(model_id);
const model = await VoxtralForConditionalGeneration.from_pretrained(
    model_id,
    {
        dtype: {
            embed_tokens: "fp16", // "fp32", "fp16", "q8", "q4"
            audio_encoder: "q4", // "fp32", "fp16", "q8", "q4", "q4f16"
            decoder_model_merged: "q4", // "q4", "q4f16"
        },
        device: "webgpu",
    },
);

// Prepare the conversation
const conversation = [
    {
        "role": "user",
        "content": [
            { "type": "audio" },
            { "type": "text", "text": "lang:en [TRANSCRIBE]" },
        ],
    }
];
const text = processor.apply_chat_template(conversation, { tokenize: false });
const audio = await read_audio("http://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/mlk.wav", 16000);
const inputs = await processor(text, audio);

// Generate the response
const generated_ids = await model.generate({
    ...inputs,
    max_new_tokens: 256,
    streamer: new TextStreamer(processor.tokenizer, { skip_special_tokens: true, skip_prompt: true }),
});

// Decode the generated tokens
const new_tokens = generated_ids.slice(null, [inputs.input_ids.dims.at(-1), null]);
const generated_texts = processor.batch_decode(
    new_tokens,
    { skip_special_tokens: true },
);
console.log(generated_texts[0]);
// I have a dream that one day this nation will rise up and live out the true meaning of its creed.

Added in #1373 and #1375.

LFM2

LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.

The models, which we have converted to ONNX, come in three different sizes: 350M, 700M, and 1.2B parameters.

Example: Text-generation with LFM2-350M:

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/LFM2-350M-ONNX",
  { dtype: "q4" },
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is the capital of France?" },
];

// Generate a response
const output = await generator(messages, {
    max_new_tokens: 512,
    do_sample: false,
    streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
});
console.log(output[0].generated_text.at(-1).content);
// The capital of France is Paris. It is a vibrant city known for its historical landmarks, art, fashion, and gastronomy.

Added in #1367 and #1369.

ModernBERT Decoder

These models form part of the Ettin suite: the first collection of paired encoder-only and decoder-only models trained with identical data, architecture, and training recipes. Ettin enables fair comparisons between encoder and decoder architectures across multiple scales, providing state-of-the-art performance for open-data models in their respective size categories.

The list of supported models can be found here.

import { pipeline, TextStreamer } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/ettin-decoder-150m-ONNX",
  { dtype: "fp32" },
);

// Generate a response
const text = "Q: What is the capital of France?\nA:";
const output = await generator(text, {
  max_new_tokens: 128,
  streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true }),
});
console.log(output[0].generated_text);

Added in #1371.

🛠️ Other improvements

  • Add special tokens in text-generation pipeline if tokenizer requires in #1370

Full Changelog: 3.6.3...3.7.0

3.6.3

11 Jul 20:11
467f59c

Choose a tag to compare

What's new?

  • Bump @huggingface/jinja to version 0.5.1 for new chat template functionality in #1364

Full Changelog: 3.6.2...3.6.3