28 Aug 19:06

xenova

0c2dcc7

2.5.4

What's new?

Add support for 3 new vision architectures (Swin, DeiT, Yolos) in #262. Check out the Hugging Face Hub to see which models you can use!

Swin for image classification. e.g.:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/swin-base-patch4-window7-224-in22k');
let output = await classifier(url, { topk: null });
// [
//   { label: 'Bengal_tiger', score: 0.2258443683385849 },
//   { label: 'tiger, Panthera_tigris', score: 0.21161635220050812 },
//   { label: 'predator, predatory_animal', score: 0.09135803580284119 },
//   { label: 'tigress', score: 0.08038495481014252 },
//   // ... 21838 more items
// ]

DeiT for image classification. e.g.,:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/deit-tiny-distilled-patch16-224');
let output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.9804046154022217 }]

Yolos for object detection. e.g.,:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let detector = await pipeline('object-detection', 'Xenova/yolos-small-300');
let output = await detector(url);
// [
//   { label: 'remote', score: 0.9837935566902161, box: { xmin: 331, ymin: 80, xmax: 367, ymax: 192 } },
//   { label: 'cat', score: 0.94994056224823, box: { xmin: 8, ymin: 57, xmax: 316, ymax: 470 } },
//   { label: 'couch', score: 0.9843178987503052, box: { xmin: 0, ymin: 0, xmax: 639, ymax: 474 } },
//   { label: 'remote', score: 0.9704685211181641, box: { xmin: 39, ymin: 71, xmax: 179, ymax: 114 } },
//   { label: 'cat', score: 0.9921762943267822, box: { xmin: 339, ymin: 17, xmax: 642, ymax: 380 } }
// ]

Documentation improvements by @perborgen in #261

New contributors 🤗

@perborgen made their first contribution in #261

Full Changelog: 2.5.3...2.5.4

Contributors

perborgen

Assets 2

22 Aug 21:52

xenova

2.5.3

7076c8e

2.5.3

What's new?

Fix whisper timestamps for non-English languages in #253
Fix caching for some LFS files from the Hugging Face Hub in #251
Improve documentation (w/ example code and links) in #255 and #257. Thanks @josephrocca for helping with this!

New contributors 🤗

@josephrocca made their first contribution in #257

Full Changelog: 2.5.2...2.5.3

Contributors

josephrocca

Assets 2

14 Aug 21:25

xenova

2.5.2

254e99e

2.5.2

What's new?

Add audio-classification with MMS and Wav2Vec2 in #220. Example usage:

// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';

// Create audio classification pipeline
let classifier = await pipeline('audio-classification', 'Xenova/mms-lid-4017');

// Run inference
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jeanNL.wav';
let output = await classifier(url);
// [
//   { label: 'fra', score: 0.9995712041854858 },
//   { label: 'hat', score: 0.00003788191679632291 },
//   { label: 'lin', score: 0.00002646935718075838 },
//   { label: 'hun', score: 0.000015628289474989288 },
//   { label: 'bre', score: 0.000007014674793026643 }
// ]

Adds automatic-speech-recognition for Wav2Vec2 models in #220 (MMS coming soon).
Add support for multi-label classification problem type in #249. Thanks @KiterWork for reporting!
Add M2M100 tokenizer in #250. Thanks @AAnirudh07 for the feature request!
Documentation improvements

New Contributors

@celsodias12 made their first contribution in #247

Full Changelog: 2.5.1...2.5.2

Contributors

celsodias12, AAnirudh07, and KiterWork

Assets 2

09 Aug 20:48

xenova

2.5.1

b420a88

2.5.1

What's new?

Add support for Llama/Llama2 models in #232
Tokenization performance improvements in #234 (+ The Tokenizer Playground example app)
Add support for DeBERTa/DeBERTa-v2 models in #244
Documentation improvements for zero-shot-classification pipeline (link)

Full Changelog: 2.5.0...2.5.1

Assets 2

01 Aug 13:08

xenova

2.5.0

9aa1a29

2.5.0

What's new?

Support for computing CLIP image and text embeddings separately (#227)

You can now compute CLIP text and vision embeddings separately, allowing for faster inference when you only need to query one of the modalities. We've also released a demo application for semantic image search to showcase this functionality.

Example: Compute text embeddings with CLIPTextModelWithProjection.

import { AutoTokenizer, CLIPTextModelWithProjection } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Run tokenization
let texts = ['a photo of a car', 'a photo of a football match'];
let text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 512 ],
//   type: 'float32',
//   data: Float32Array(1024) [ ... ],
//   size: 1024
// }

Example: Compute vision embeddings with CLIPVisionModelWithProjection.

import { AutoProcessor, CLIPVisionModelWithProjection, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Read image and run processor
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);

// Compute embeddings
const { image_embeds } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 512 ],
//   type: 'float32',
//   data: Float32Array(512) [ ... ],
//   size: 512
// }

Improved browser extension example/template (#196)

We've updated the source code for our example browser extension, making the following improvements:

Custom model caching - meaning you don't need to ship the weights of the model with the extension. In addition to a smaller bundle size, when the user updates, they won't need to redownload the weights!
Use ES6 module syntax (vs. CommonJS) - much cleaner code!
Persistent service worker - fixed an issue where the service worker would go to sleep after a portion of inactivity.

Summary of updates since last minor release (2.4.0):

(2.4.1) Improved documentation
(2.4.2) Support for private/gated models (#202)
(2.4.3) Example Next.js applications (#211) + MPNet model support (#221)
(2.4.4) StarCoder models + example application (release; demo + source code)

Misc bug fixes and improvements

Fixed floating-point-precision edge-case for resizing images
Fixed RawImage.save()
BPE tokenization for weird whitespace characters (#208)

Assets 2

28 Jul 11:59

xenova

2.4.4

27920d8

2.4.4

What's new?

New model: StarCoder (Xenova/starcoderbase-1b and Xenova/tiny_starcoder_py)
In-browser code completion example application (demo and source code)

Full Changelog: 2.4.3...2.4.4

Assets 2

27 Jul 04:54

xenova

2.4.3

da67f41

2.4.3

What's new?

Example next.js applications in #211
- Tutorial
- Demo: client-side or server-side
- Source code: client-side or server-side
Add support for mpnet models by @xenova in #221

Full Changelog: 2.4.2...2.4.3

Contributors

xenova

Assets 2

22 Jul 03:29

xenova

2.4.2

f181e13

2.4.2

What's new?

Add support for private/gated model access by @xenova in #202
Fix BPE tokenization for weird whitespace characters by @xenova in #208
- Thanks to @fozziethebeat for reporting and helping to debug
Minor documentation improvements

Full Changelog: 2.4.1...2.4.2

Contributors

fozziethebeat and xenova

Assets 2

11 Jul 00:50

xenova

2.4.1

4e947aa

2.4.1

What's new?

Minor bug fixes

Fix padding and truncation of long sequences in certain pipelines by @xenova in #190
Object-detection pipeline improvements + better documentation by @xenova in #189

Full Changelog: 2.4.0...2.4.1

Contributors

xenova

Assets 2

09 Jul 22:42

xenova

2.4.0

316d10e

2.4.0

What's new?

Word-level timestamps for Whisper automatic-speech-recognition 🤯

This release adds the ability to predict word-level timestamps for our whisper automatic-speech-recognition models by analyzing the cross-attentions and applying dynamic time warping. Our implementation is adapted from this PR, which added this functionality to the 🤗 transformers Python library.

Example usage: (see docs)

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
    revision: 'output_attentions',
});
let output = await transcriber(url, { return_timestamps: 'word' });
// {
//   "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
//   "chunks": [
//     { "text": " And", "timestamp": [0, 0.78] },
//     { "text": " so", "timestamp": [0.78, 1.06] },
//     { "text": " my", "timestamp": [1.06, 1.46] },
//     ...
//     { "text": " for", "timestamp": [9.72, 9.92] },
//     { "text": " your", "timestamp": [9.92, 10.22] },
//     { "text": " country.", "timestamp": [10.22, 13.5] }
//   ]
// }

Note: For now, you need to choose the output_attentions revision (see above). In future, we may merge these models into the main branch. Also, we currently do not have exports for the medium and large models, simply because I don't have enough RAM to do the export myself (>25GB needed) 😅 ... so, if you would like to use our conversion script to do the conversion yourself, please make a PR on the hub with these new models (under a new output_attentions branch)!

From our testing, the JS implementation exactly matches the output produced by the Python implementation (when using the same model of course)! 🥳

Python (left) vs. JavaScript (right)

surprise me

I'm excited to see what you all build with this! Please tag me on twitter if you use it in your project - I'd love to see! I'm also planning on adding this as an option to whisper-web, so stay tuned! 🚀

Misc bug fixes and improvements

Fix loading of grayscale images in node.js (#178)

Assets 2

Releases: huggingface/transformers.js

2.5.4

What's new?

New contributors 🤗

Contributors

Uh oh!

2.5.3

What's new?

New contributors 🤗

Contributors

Uh oh!

2.5.2

What's new?

New Contributors

Contributors

Uh oh!

2.5.1

What's new?

Uh oh!

2.5.0

What's new?

Support for computing CLIP image and text embeddings separately (#227)

Improved browser extension example/template (#196)

Summary of updates since last minor release (2.4.0):

Misc bug fixes and improvements

Uh oh!

2.4.4

What's new?

Uh oh!

2.4.3

What's new?

Contributors

Uh oh!

2.4.2

What's new?

Contributors

Uh oh!

2.4.1

What's new?

Minor bug fixes

Contributors

Uh oh!

2.4.0

What's new?

Word-level timestamps for Whisper automatic-speech-recognition 🤯

Misc bug fixes and improvements

Uh oh!