diff --git a/README.md b/README.md index 49776b05d..6f7ed3f70 100644 --- a/README.md +++ b/README.md @@ -11,25 +11,19 @@

- - NPM - - - NPM Downloads - - - jsDelivr Hits - - - License - - - Documentation - + NPM + NPM Downloads + jsDelivr Hits + License + Documentation

-State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! +

+

State-of-the-art Machine Learning for the Web

+

+ +Run 🤗 Transformers directly in your browser, with no need for a server! Transformers.js is designed to be functionally equivalent to Hugging Face's [transformers](https://github.com/huggingface/transformers) python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: - 📝 **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. @@ -42,6 +36,22 @@ Transformers.js uses [ONNX Runtime](https://onnxruntime.ai/) to run models in th For more information, check out the full [documentation](https://huggingface.co/docs/transformers.js). +## Installation + + +To install via [NPM](https://www.npmjs.com/package/@huggingface/transformers), run: +```bash +npm i @huggingface/transformers +``` + +Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with: +```html + +``` + + ## Quick tour @@ -72,9 +82,9 @@ out = pipe('I love transformers!') import { pipeline } from '@huggingface/transformers'; // Allocate a pipeline for sentiment-analysis -let pipe = await pipeline('sentiment-analysis'); +const pipe = await pipeline('sentiment-analysis'); -let out = await pipe('I love transformers!'); +const out = await pipe('I love transformers!'); // [{'label': 'POSITIVE', 'score': 0.999817686}] ``` @@ -86,29 +96,40 @@ let out = await pipe('I love transformers!'); You can also use a different model by specifying the model id or path as the second argument to the `pipeline` function. For example: ```javascript // Use a different model for sentiment-analysis -let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment'); +const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment'); ``` +By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like +to run the model on your GPU (via WebGPU), you can do this by setting `device: 'webgpu'`, for example: +```javascript +// Run the model on WebGPU +const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { + device: 'webgpu', +}); +``` -## Installation +For more information, check out the [WebGPU guide](https://huggingface.co/docs/transformers.js/guides/webgpu). +> [!WARNING] +> The WebGPU API is still experimental in many browsers, so if you run into any issues, +> please file a [bug report](https://github.com/huggingface/transformers.js/issues/new?title=%5BWebGPU%5D%20Error%20running%20MODEL_ID_GOES_HERE&assignees=&labels=bug,webgpu&projects=&template=1_bug-report.yml). -To install via [NPM](https://www.npmjs.com/package/@huggingface/transformers), run: -```bash -npm i @huggingface/transformers -``` - -Alternatively, you can use it in vanilla JS, without any bundler, by using a CDN or static hosting. For example, using [ES Modules](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules), you can import the library with: -```html - +In resource-constrained environments, such as web browsers, it is advisable to use a quantized version of +the model to lower bandwidth and optimize performance. This can be achieved by adjusting the `dtype` option, +which allows you to select the appropriate data type for your model. While the available options may vary +depending on the specific model, typical choices include `"fp32"` (default for WebGPU), `"fp16"`, `"q8"` +(default for WASM), and `"q4"`. For more information, check out the [quantization guide](https://huggingface.co/docs/transformers.js/guides/dtypes). +```javascript +// Run the model at 4-bit quantization +const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { + dtype: 'q4', +}); ``` ## Examples -Want to jump straight in? Get started with one of our sample applications/templates: +Want to jump straight in? Get started with one of our sample applications/templates, which can be found [here](https://github.com/huggingface/transformers.js-examples). | Name | Description | Links | |-------------------|----------------------------------|-------------------------------| diff --git a/docs/scripts/build_readme.py b/docs/scripts/build_readme.py index 611c5b3f6..84bb30cf0 100644 --- a/docs/scripts/build_readme.py +++ b/docs/scripts/build_readme.py @@ -13,33 +13,23 @@

- - NPM - - - NPM Downloads - - - jsDelivr Hits - - - License - - - Documentation - + NPM + NPM Downloads + jsDelivr Hits + License + Documentation

{intro} -## Quick tour - -{quick_tour} - ## Installation {installation} +## Quick tour + +{quick_tour} + ## Examples {examples} diff --git a/docs/snippets/0_introduction.snippet b/docs/snippets/0_introduction.snippet index d25a0e513..34d71bccb 100644 --- a/docs/snippets/0_introduction.snippet +++ b/docs/snippets/0_introduction.snippet @@ -1,5 +1,9 @@ -State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! +

+

State-of-the-art Machine Learning for the Web

+

+ +Run 🤗 Transformers directly in your browser, with no need for a server! Transformers.js is designed to be functionally equivalent to Hugging Face's [transformers](https://github.com/huggingface/transformers) python library, meaning you can run the same pretrained models using a very similar API. These models support common tasks in different modalities, such as: - 📝 **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation. diff --git a/docs/snippets/1_quick-tour.snippet b/docs/snippets/1_quick-tour.snippet index 2e906a0f1..ddf4d5744 100644 --- a/docs/snippets/1_quick-tour.snippet +++ b/docs/snippets/1_quick-tour.snippet @@ -26,9 +26,9 @@ out = pipe('I love transformers!') import { pipeline } from '@huggingface/transformers'; // Allocate a pipeline for sentiment-analysis -let pipe = await pipeline('sentiment-analysis'); +const pipe = await pipeline('sentiment-analysis'); -let out = await pipe('I love transformers!'); +const out = await pipe('I love transformers!'); // [{'label': 'POSITIVE', 'score': 0.999817686}] ``` @@ -40,5 +40,32 @@ let out = await pipe('I love transformers!'); You can also use a different model by specifying the model id or path as the second argument to the `pipeline` function. For example: ```javascript // Use a different model for sentiment-analysis -let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment'); +const pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment'); +``` + +By default, when running in the browser, the model will be run on your CPU (via WASM). If you would like +to run the model on your GPU (via WebGPU), you can do this by setting `device: 'webgpu'`, for example: +```javascript +// Run the model on WebGPU +const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { + device: 'webgpu', +}); +``` + +For more information, check out the [WebGPU guide](/guides/webgpu). + +> [!WARNING] +> The WebGPU API is still experimental in many browsers, so if you run into any issues, +> please file a [bug report](https://github.com/huggingface/transformers.js/issues/new?title=%5BWebGPU%5D%20Error%20running%20MODEL_ID_GOES_HERE&assignees=&labels=bug,webgpu&projects=&template=1_bug-report.yml). + +In resource-constrained environments, such as web browsers, it is advisable to use a quantized version of +the model to lower bandwidth and optimize performance. This can be achieved by adjusting the `dtype` option, +which allows you to select the appropriate data type for your model. While the available options may vary +depending on the specific model, typical choices include `"fp32"` (default for WebGPU), `"fp16"`, `"q8"` +(default for WASM), and `"q4"`. For more information, check out the [quantization guide](/guides/dtypes). +```javascript +// Run the model at 4-bit quantization +const pipe = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english', { + dtype: 'q4', +}); ``` diff --git a/docs/snippets/3_examples.snippet b/docs/snippets/3_examples.snippet index f8bf7ed1c..4138482f6 100644 --- a/docs/snippets/3_examples.snippet +++ b/docs/snippets/3_examples.snippet @@ -1,4 +1,4 @@ -Want to jump straight in? Get started with one of our sample applications/templates: +Want to jump straight in? Get started with one of our sample applications/templates, which can be found [here](https://github.com/huggingface/transformers.js-examples). | Name | Description | Links | |-------------------|----------------------------------|-------------------------------| diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 4458c049b..d0b622528 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -23,10 +23,14 @@ title: Server-side Inference in Node.js title: Tutorials - sections: + - local: guides/webgpu + title: Running models on WebGPU + - local: guides/dtypes + title: Using quantized models (dtypes) - local: guides/private title: Accessing Private/Gated Models - local: guides/node-audio-processing - title: Server-side Audio Processing in Node.js + title: Server-side Audio Processing title: Developer Guides - sections: - local: api/transformers diff --git a/docs/source/guides/dtypes.md b/docs/source/guides/dtypes.md new file mode 100644 index 000000000..a479e1f95 --- /dev/null +++ b/docs/source/guides/dtypes.md @@ -0,0 +1,130 @@ +# Using quantized models (dtypes) + +Before Transformers.js v3, we used the `quantized` option to specify whether to use a quantized (q8) or full-precision (fp32) variant of the model by setting `quantized` to `true` or `false`, respectively. Now, we've added the ability to select from a much larger list with the `dtype` parameter. + +The list of available quantizations depends on the model, but some common ones are: full-precision (`"fp32"`), half-precision (`"fp16"`), 8-bit (`"q8"`, `"int8"`, `"uint8"`), and 4-bit (`"q4"`, `"bnb4"`, `"q4f16"`). + +

+ + + + Available dtypes for mixedbread-ai/mxbai-embed-xsmall-v1 + + (e.g., mixedbread-ai/mxbai-embed-xsmall-v1) +

+ +## Basic usage + +**Example:** Run Qwen2.5-0.5B-Instruct in 4-bit quantization ([demo](https://v2.scrimba.com/s0dlcpv0ci)) + +```js +import { pipeline } from "@huggingface/transformers"; + +// Create a text generation pipeline +const generator = await pipeline( + "text-generation", + "onnx-community/Qwen2.5-0.5B-Instruct", + { dtype: "q4", device: "webgpu" }, +); + +// Define the list of messages +const messages = [ + { role: "system", content: "You are a helpful assistant." }, + { role: "user", content: "Tell me a funny joke." }, +]; + +// Generate a response +const output = await generator(messages, { max_new_tokens: 128 }); +console.log(output[0].generated_text.at(-1).content); +``` + +## Per-module dtypes + +Some encoder-decoder models, like Whisper or Florence-2, are extremely sensitive to quantization settings: especially of the encoder. For this reason, we added the ability to select per-module dtypes, which can be done by providing a mapping from module name to dtype. + +**Example:** Run Florence-2 on WebGPU ([demo](https://v2.scrimba.com/s0pdm485fo)) + +```js +import { Florence2ForConditionalGeneration } from "@huggingface/transformers"; + +const model = await Florence2ForConditionalGeneration.from_pretrained( + "onnx-community/Florence-2-base-ft", + { + dtype: { + embed_tokens: "fp16", + vision_encoder: "fp16", + encoder_model: "q4", + decoder_model_merged: "q4", + }, + device: "webgpu", + }, +); +``` + +

+ Florence-2 running on WebGPU +

+ +
+ +See full code example + + +```js +import { + Florence2ForConditionalGeneration, + AutoProcessor, + AutoTokenizer, + RawImage, +} from "@huggingface/transformers"; + +// Load model, processor, and tokenizer +const model_id = "onnx-community/Florence-2-base-ft"; +const model = await Florence2ForConditionalGeneration.from_pretrained( + model_id, + { + dtype: { + embed_tokens: "fp16", + vision_encoder: "fp16", + encoder_model: "q4", + decoder_model_merged: "q4", + }, + device: "webgpu", + }, +); +const processor = await AutoProcessor.from_pretrained(model_id); +const tokenizer = await AutoTokenizer.from_pretrained(model_id); + +// Load image and prepare vision inputs +const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg"; +const image = await RawImage.fromURL(url); +const vision_inputs = await processor(image); + +// Specify task and prepare text inputs +const task = ""; +const prompts = processor.construct_prompts(task); +const text_inputs = tokenizer(prompts); + +// Generate text +const generated_ids = await model.generate({ + ...text_inputs, + ...vision_inputs, + max_new_tokens: 100, +}); + +// Decode generated text +const generated_text = tokenizer.batch_decode(generated_ids, { + skip_special_tokens: false, +})[0]; + +// Post-process the generated text +const result = processor.post_process_generation( + generated_text, + task, + image.size, +); +console.log(result); +// { '': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' } +``` + +
diff --git a/docs/source/guides/webgpu.md b/docs/source/guides/webgpu.md new file mode 100644 index 000000000..378d3ee8b --- /dev/null +++ b/docs/source/guides/webgpu.md @@ -0,0 +1,87 @@ +# Running models on WebGPU + +WebGPU is a new web standard for accelerated graphics and compute. The [API](https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API) enables web developers to use the underlying system's GPU to carry out high-performance computations directly in the browser. WebGPU is the successor to [WebGL](https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API) and provides significantly better performance, because it allows for more direct interaction with modern GPUs. Lastly, it supports general-purpose GPU computations, which makes it just perfect for machine learning! + +> [!WARNING] +> As of October 2024, global WebGPU support is around 70% (according to [caniuse.com](https://caniuse.com/webgpu)), meaning some users may not be able to use the API. +> +> If the following demos do not work in your browser, you may need to enable it using a feature flag: +> +> - Firefox: with the `dom.webgpu.enabled` flag (see [here](https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Experimental_features#:~:text=tested%20by%20Firefox.-,WebGPU%20API,-The%20WebGPU%20API)). +> - Safari: with the `WebGPU` feature flag (see [here](https://webkit.org/blog/14879/webgpu-now-available-for-testing-in-safari-technology-preview/)). +> - Older Chromium browsers (on Windows, macOS, Linux): with the `enable-unsafe-webgpu` flag (see [here](https://developer.chrome.com/docs/web-platform/webgpu/troubleshooting-tips)). + +## Usage in Transformers.js v3 + +Thanks to our collaboration with [ONNX Runtime Web](https://www.npmjs.com/package/onnxruntime-web), enabling WebGPU acceleration is as simple as setting `device: 'webgpu'` when loading a model. Let's see some examples! + +**Example:** Compute text embeddings on WebGPU ([demo](https://v2.scrimba.com/s06a2smeej)) + +```js +import { pipeline } from "@huggingface/transformers"; + +// Create a feature-extraction pipeline +const extractor = await pipeline( + "feature-extraction", + "mixedbread-ai/mxbai-embed-xsmall-v1", + { device: "webgpu" }, +}); + +// Compute embeddings +const texts = ["Hello world!", "This is an example sentence."]; +const embeddings = await extractor(texts, { pooling: "mean", normalize: true }); +console.log(embeddings.tolist()); +// [ +// [-0.016986183822155, 0.03228696808218956, -0.0013630966423079371, ... ], +// [0.09050482511520386, 0.07207386940717697, 0.05762749910354614, ... ], +// ] +``` + +**Example:** Perform automatic speech recognition with OpenAI whisper on WebGPU ([demo](https://v2.scrimba.com/s0oi76h82g)) + +```js +import { pipeline } from "@huggingface/transformers"; + +// Create automatic speech recognition pipeline +const transcriber = await pipeline( + "automatic-speech-recognition", + "onnx-community/whisper-tiny.en", + { device: "webgpu" }, +); + +// Transcribe audio from a URL +const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav"; +const output = await transcriber(url); +console.log(output); +// { text: ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.' } +``` + +**Example:** Perform image classification with MobileNetV4 on WebGPU ([demo](https://v2.scrimba.com/s0fv2uab1t)) + +```js +import { pipeline } from "@huggingface/transformers"; + +// Create image classification pipeline +const classifier = await pipeline( + "image-classification", + "onnx-community/mobilenetv4_conv_small.e2400_r224_in1k", + { device: "webgpu" }, +); + +// Classify an image from a URL +const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg"; +const output = await classifier(url); +console.log(output); +// [ +// { label: 'tiger, Panthera tigris', score: 0.6149784922599792 }, +// { label: 'tiger cat', score: 0.30281734466552734 }, +// { label: 'tabby, tabby cat', score: 0.0019135422771796584 }, +// { label: 'lynx, catamount', score: 0.0012161266058683395 }, +// { label: 'Egyptian cat', score: 0.0011465961579233408 } +// ] +``` + +## Reporting bugs and providing feedback + +Due to the experimental nature of the WebGPU API, especially in non-Chromium browsers, you may + diff --git a/docs/source/pipelines.md b/docs/source/pipelines.md index 0c1b3d584..3e1ad6b15 100644 --- a/docs/source/pipelines.md +++ b/docs/source/pipelines.md @@ -16,7 +16,7 @@ Start by creating an instance of `pipeline()` and specifying a task you want to ```javascript import { pipeline } from '@huggingface/transformers'; -let classifier = await pipeline('sentiment-analysis'); +const classifier = await pipeline('sentiment-analysis'); ``` When running for the first time, the `pipeline` will download and cache the default pretrained model associated with the task. This can take a while, but subsequent calls will be much faster. @@ -30,14 +30,14 @@ By default, models will be downloaded from the [Hugging Face Hub](https://huggin You can now use the classifier on your target text by calling it as a function: ```javascript -let result = await classifier('I love transformers!'); +const result = await classifier('I love transformers!'); // [{'label': 'POSITIVE', 'score': 0.9998}] ``` If you have multiple inputs, you can pass them as an array: ```javascript -let result = await classifier(['I love transformers!', 'I hate transformers!']); +const result = await classifier(['I love transformers!', 'I hate transformers!']); // [{'label': 'POSITIVE', 'score': 0.9998}, {'label': 'NEGATIVE', 'score': 0.9982}] ``` @@ -46,9 +46,9 @@ You can also specify a different model to use for the pipeline by passing it as ```javascript -let reviewer = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment'); +const reviewer = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment'); -let result = await reviewer('The Shawshank Redemption is a true masterpiece of cinema.'); +const result = await reviewer('The Shawshank Redemption is a true masterpiece of cinema.'); // [{label: '5 stars', score: 0.8167929649353027}] ``` @@ -59,10 +59,10 @@ The `pipeline()` function is a great way to quickly use a pretrained model for i ```javascript // Allocate a pipeline for Automatic Speech Recognition -let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small.en'); +const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small.en'); // Transcribe an audio file, loaded from a URL. -let result = await transcriber('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac'); +const result = await transcriber('https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac'); // {text: ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'} ``` @@ -86,7 +86,7 @@ You can also specify which revision of the model to use, by passing a `revision` Since the Hugging Face Hub uses a git-based versioning system, you can use any valid git revision specifier (e.g., branch name or commit hash) ```javascript -let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', { +const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', { revision: 'output_attentions', }); ``` @@ -100,17 +100,17 @@ Many pipelines have additional options that you can specify. For example, when u ```javascript // Allocation a pipeline for translation -let translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M'); +const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M'); // Translate from English to Greek -let result = await translator('I like to walk my dog.', { +const result = await translator('I like to walk my dog.', { src_lang: 'eng_Latn', tgt_lang: 'ell_Grek' }); // [ { translation_text: 'Μου αρέσει να περπατάω το σκυλί μου.' } ] // Translate back to English -let result2 = await translator(result[0].translation_text, { +const result2 = await translator(result[0].translation_text, { src_lang: 'ell_Grek', tgt_lang: 'eng_Latn' }); @@ -125,8 +125,8 @@ For example, to generate a poem using `LaMini-Flan-T5-783M`, you can do: ```javascript // Allocate a pipeline for text2text-generation -let poet = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M'); -let result = await poet('Write me a love poem about cheese.', { +const poet = await pipeline('text2text-generation', 'Xenova/LaMini-Flan-T5-783M'); +const result = await poet('Write me a love poem about cheese.', { max_new_tokens: 200, temperature: 0.9, repetition_penalty: 2.0,