01 Jul 03:04

xenova

aceab9b

2.3.1

What's new?

New models and tokenizers

Models:
- MobileViT for image classification
- Roberta for token classification (thanks @julien-c)
- XLMRoberta for masked language modelling, sequence classification, token classification, and question answering
Tokenizers: FalconTokenizer, GPTNeoXTokenizer

Improved documentation

Details on how to discover and share transformers.js models on the hub (link)
Example text-generation code (link)
Example image-classification code (link)

Misc bug fixes

Fix conversion to grayscale (commit)
Aligned .generate() function output with original python implementation
Fix issue with non-greedy samplers
Use WASM SIMD on iOS != 16.4.x (thanks @lsb)

New Contributors

@julien-c made their first contribution in #170
@lsb made their first contribution in #174

Full Changelog: 2.3.0...2.3.1

Contributors

lsb and julien-c

Assets 2

22 Jun 15:21

xenova

2.3.0

8d6622e

2.3.0

What's new?

Improved 🤗 Hub integration and model discoverability!

All Transformers.js-compatible models are now displayed with a super cool tag! To indicate your model is compatible with the library, simply add the "transformers.js" library tag in your README (example).

This also means you can now search for and filter these models by task!

For example,

https://huggingface.co/models?library=transformers.js lists all Transformers.js models
https://huggingface.co/models?library=transformers.js&pipeline_tag=feature-extraction lists all models which can be used in the feature-extraction pipeline!

And lastly, clicking the "Use in Transformers.js" button will show some sample code for how to use the model!

Chroma 🤝 Transformers.js

You can now use all Transformers.js-compatible feature-extraction models for embeddings computation directly in Chroma! For example:

const {ChromaClient, TransformersEmbeddingFunction} = require('chromadb');
const client = new ChromaClient();

// Create the embedder. In this case, I just use the defaults, but you can change the model,
// quantization, revision, or add a progress callback, if desired.
const embedder = new TransformersEmbeddingFunction({ /* Configuration goes here */ });

const main = async () => {
    // Empties and completely resets the database.
    await client.reset()

    // Create the collection
    const collection = await client.createCollection({name: "my_collection", embeddingFunction: embedder})

    // Add some data to the collection
    await collection.add({
        ids: ["id1", "id2", "id3"],
        metadatas: [{"source": "my_source"}, {"source": "my_source"},  {"source": "my_source"}],
        documents: ["I love walking my dog", "This is another document", "This is a legal document"],
    }) 
    
    // Query the collection
    const results = await collection.query({
        nResults: 2, 
        queryTexts: ["This is a query document"]
    }) 
    console.log(results)
    // {
    //     ids: [ [ 'id2', 'id3' ] ],
    //     embeddings: null,
    //     documents: [ [ 'This is another document', 'This is a legal document' ] ],
    //     metadatas: [ [ [Object], [Object] ] ],
    //     distances: [ [ 1.0109775066375732, 1.0756263732910156 ] ]
    // }
}

main();

Better alignment with python library for calling decoder-only models

You can now call decoder-only models loaded via AutoModel.from_pretrained(...):

import { AutoModel, AutoTokenizer } from '@xenova/transformers';

// Choose model to use
let model_id = "Xenova/gpt2";

// Load model and tokenizer
let tokenizer = await AutoTokenizer.from_pretrained(model_id);
let model = await AutoModel.from_pretrained(model_id);

// Tokenize text and call
let model_inputs = await tokenizer('Once upon a time');
let output = await model(model_inputs);

console.log(output);
// {
//     logits: Tensor {
//         dims: [ 1, 4, 50257 ],
//         type: 'float32',
//         data: Float32Array(201028) [
//             -20.166624069213867, -19.662782669067383, -23.189680099487305,
//             ...
//         ],
//         size: 201028
//     },
//     past_key_values: { ... }
// }

Examples for computing perplexity: #137 (comment)

More accurate quantization parameters for whisper models

We've updated the quantization parameters used for the pre-converted whisper models on the hub. You can test them out with whisper web! Thanks to @jozefchutka for reporting this issue.

Thanks to @jozefchutka for reporting this issue!

Misc bug fixes and improvements

Do not use spread operator to concatenate large arrays (#154)
Set chunk timestamp to rounded time by @PushpenderSaini0 (#160)

Contributors

jozefchutka and PushpenderSaini0

Assets 2

09 Jun 16:22

xenova

2.2.0

035f69f

2.2.0

What's new?

Multilingual speech recognition and translation w/ Whisper

You can now transcribe and translate speech for over 100 different languages, directly in your browser, with Whisper! Play around with our demo application here.

Example: Transcribe English.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url);
// { text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country." }

Example: Transcribe English w/ timestamps.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url, { return_timestamps: true });
// {
//   text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country."
//   chunks: [
//     { timestamp: [0, 8],  text: " And so my fellow Americans ask not what your country can do for you" }
//     { timestamp: [8, 11], text: " ask what you can do for your country." }
//   ]
// }

Example: Transcribe French.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'transcribe' });
// { text: " J'adore, j'aime, je n'aime pas, je déteste." }

Example: Translate French to English.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'translate' });
// { text: " I love, I like, I don't like, I hate." }

Misc

Aligned .generate() function with original python implementation
Minor improvements to documentation (+ some examples). More to come in the future.

Full Changelog: 2.1.1...2.2.0

Assets 2

02 Jun 00:27

xenova

2.1.1

e5e460b

2.1.1

Minor patch for v2.1.0 to fix an issue with browser caching.

Assets 2

01 Jun 13:18

xenova

2.1.0

34b0e8b

2.1.0

What's new?

Improved feature extraction pipeline for Embeddings

You can now perform feature extraction on models other than sentence-transformers! All you need to do is target a repo (and/or revision) that was exported with --task default. Also be sure to use the correct quantization for your use-case!

Example: Run feature extraction with bert-base-uncased (without pooling/normalization).

let extractor = await pipeline('feature-extraction', 'Xenova/bert-base-uncased', { revision: 'default' });
let result = await extractor('This is a simple test.');
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.05939924716949463, 0.021655935794115067, ...],
//     dims: [1, 8, 768]
// }

Example: Run feature extraction with bert-base-uncased (with pooling/normalization).

let extractor = await pipeline('feature-extraction', 'Xenova/bert-base-uncased', { revision: 'default' });
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.03373778983950615, -0.010106077417731285, ...],
//     dims: [1, 768]
// }

Example: Calculating embeddings with sentence-transformers models.

let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
let result = await extractor('This is a simple test.', { pooling: 'mean', normalize: true });
console.log(result);
// Tensor {
//     type: 'float32',
//     data: Float32Array [0.09094982594251633, -0.014774246141314507, ...],
//     dims: [1, 384]
// }

This also means you can do things like semantic search directly in JavaScript/Typescript! Check out the Pinecone docs for an example app which uses Transformers.js!

Over 100 Transformers.js models on the hub!

We now have 109 models to choose from! Check them out at https://huggingface.co/models?other=transformers.js! If you'd like to contribute models (exported with Optimum), you can tag them with library_name: "transformers.js"! Let's make ML more web-friendly!

Misc

Fixed various quantization/exporting issues

Full Changelog: 2.0.2...2.1.0

Assets 2

29 May 11:05

xenova

2.0.2

8b89766

2.0.2

Fixes issues stemming from ORT's recent release of a buggy version 1.15.0 🙄 (https://www.npmjs.com/package/onnxruntime-web)

Also freezes examples and updates links to use the latest stable wasm files.

Assets 2

20 May 14:17

xenova

2.0.1

d7c353d

2.0.1

Minor bug fixes

BERT Tokenization for strings with numbers
Demo site for token-classification (#116)

NPM package

Update keywords

Assets 2

17 May 17:48

xenova

2.0.0

8d8f511

2.0.0

Transformers.js v2.0.0

It's finally here! 🔥

Run Hugging Face transformers directly in your browser, with no need for a server!

GitHub: https://github.com/xenova/transformers.js
Demo site: https://xenova.github.io/transformers.js/
Documentation: https://huggingface.co/docs/transformers.js

Main features:

🛠️ Complete ES6 rewrite
📄 Documentation and examples
🤗 Improved Hugging Face Hub integration
🖥️ Server-side model caching (in Node.js)

Dev-related features:

🧪 Improved testing framework w/ Jest
⚙️ CI/CD with GitHub actions

Assets 2

16 May 22:46

xenova

2.0.0-alpha.4

20b6e6d

2.0.0-alpha.4 Pre-release

Pre-release

Pre-release for Transformers.js v2.0.0

Same as https://github.com/xenova/transformers.js/releases/tag/2.0.0-alpha.3 with various improvements, including:

GH actions to build demo site (https://xenova.github.io/transformers.js/)
Calculate whisper mel filters when not present in processor's config.json

Assets 2

16 May 06:16

xenova

2.0.0-alpha.3

507ec57

2.0.0-alpha.3 Pre-release

Pre-release

Pre-release for Transformers.js v2.0.0

Same as https://github.com/xenova/transformers.js/releases/tag/2.0.0-alpha.2 but with added allowLocalModels setting and improved handling of errors (e.g., CORS errors).

Assets 2

Releases: huggingface/transformers.js

2.3.1

What's new?

New models and tokenizers

Improved documentation

Misc bug fixes

New Contributors

Contributors

Uh oh!

2.3.0

What's new?

Improved 🤗 Hub integration and model discoverability!

Chroma 🤝 Transformers.js

Better alignment with python library for calling decoder-only models

More accurate quantization parameters for whisper models

Misc bug fixes and improvements

Contributors

Uh oh!

2.2.0

What's new?

Multilingual speech recognition and translation w/ Whisper

Misc

Uh oh!

2.1.1

Uh oh!

2.1.0

What's new?

Improved feature extraction pipeline for Embeddings

Over 100 Transformers.js models on the hub!

Misc

Uh oh!

2.0.2

Uh oh!

2.0.1

Minor bug fixes

NPM package

Uh oh!

2.0.0

Transformers.js v2.0.0

Main features:

Dev-related features:

Uh oh!

2.0.0-alpha.4

Pre-release for Transformers.js v2.0.0

Uh oh!

2.0.0-alpha.3

Pre-release for Transformers.js v2.0.0

Uh oh!