diff --git a/README.md b/README.md index 848b999dc..0c1a17d26 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@
# WebLLM + [![NPM Package](https://img.shields.io/badge/NPM_Package-Published-cc3534)](https://www.npmjs.com/package/@mlc-ai/web-llm) [!["WebLLM Chat Deployed"](https://img.shields.io/badge/WebLLM_Chat-Deployed-%2332a852)](https://chat.webllm.ai/) [![Join Discord](https://img.shields.io/badge/Join-Discord-7289DA?logo=discord&logoColor=white)](https://discord.gg/9Xpy2HGBuD) @@ -9,12 +10,12 @@ **High-Performance In-Browser LLM Inference Engine.** - [Documentation](https://webllm.mlc.ai/docs/) | [Blogpost](https://blog.mlc.ai/2024/06/13/webllm-a-high-performance-in-browser-llm-inference-engine) | [Paper](https://arxiv.org/abs/2412.15803) | [Examples](examples)
## Overview + WebLLM is a high-performance in-browser LLM inference engine that brings language model inference directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. @@ -33,13 +34,14 @@ You can use WebLLM as a base [npm package](https://www.npmjs.com/package/@mlc-ai ## Key Features + - **In-Browser Inference**: WebLLM is a high-performance, in-browser language model inference engine that leverages WebGPU for hardware acceleration, enabling powerful LLM operations directly within web browsers without server-side processing. - [**Full OpenAI API Compatibility**](#full-openai-compatibility): Seamlessly integrate your app with WebLLM using OpenAI API with functionalities such as streaming, JSON-mode, logit-level control, seeding, and more. - **Structured JSON Generation**: WebLLM supports state-of-the-art JSON mode structured generation, implemented in the WebAssembly portion of the model library for optimal performance. Check [WebLLM JSON Playground](https://huggingface.co/spaces/mlc-ai/WebLLM-JSON-Playground) on HuggingFace to try generating JSON output with custom JSON schema. -- [**Extensive Model Support**](#built-in-models): WebLLM natively supports a range of models including Llama 3, Phi 3, Gemma, Mistral, Qwen(通义千问), and many others, making it versatile for various AI tasks. For the complete supported model list, check [MLC Models](https://mlc.ai/models). +- [**Extensive Model Support**](#built-in-models): WebLLM natively supports a range of models including Llama 3, Phi 3, Gemma, Mistral, Qwen(通义千问), and many others, making it versatile for various AI tasks. For the complete supported model list, check [MLC Models](https://webllm.mlc.ai/models). - [**Custom Model Integration**](#custom-models): Easily integrate and deploy custom models in MLC format, allowing you to adapt WebLLM to specific needs and scenarios, enhancing flexibility in model deployment. @@ -53,7 +55,7 @@ You can use WebLLM as a base [npm package](https://www.npmjs.com/package/@mlc-ai ## Built-in Models -Check the complete list of available models on [MLC Models](https://mlc.ai/models). WebLLM supports a subset of these available models and the list can be accessed at [`prebuiltAppConfig.model_list`](https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293). +Check the complete list of available models on [MLC Models](https://webllm.mlc.ai/models). WebLLM supports a subset of these available models and the list can be accessed at [`prebuiltAppConfig.model_list`](https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L293). Here are the primary families of models currently supported: @@ -67,7 +69,7 @@ If you need more models, [request a new model via opening an issue](https://gith ## Jumpstart with Examples -Learn how to use WebLLM to integrate large language models into your application and generate chat completions through this simple Chatbot example: +Learn how to use WebLLM to integrate large language models into your application and generate chat completions through this simple Chatbot example: [![Example Chatbot on JSFiddle](https://img.shields.io/badge/Example-JSFiddle-blue?logo=jsfiddle&logoColor=white)](https://jsfiddle.net/neetnestor/4nmgvsa2/) [![Example Chatbot on Codepen](https://img.shields.io/badge/Example-Codepen-gainsboro?logo=codepen)](https://codepen.io/neetnestor/pen/vYwgZaG) @@ -110,9 +112,11 @@ Thanks to [jsdelivr.com](https://www.jsdelivr.com/package/npm/@mlc-ai/web-llm), ```javascript import * as webllm from "https://esm.run/@mlc-ai/web-llm"; ``` + It can also be dynamically imported as: + ```javascript -const webllm = await import ("https://esm.run/@mlc-ai/web-llm"); +const webllm = await import("https://esm.run/@mlc-ai/web-llm"); ``` ### Create MLCEngine @@ -127,7 +131,7 @@ import { CreateMLCEngine } from "@mlc-ai/web-llm"; // Callback function to update model loading progress const initProgressCallback = (initProgress) => { console.log(initProgress); -} +}; const selectedModel = "Llama-3.1-8B-Instruct-q4f32_1-MLC"; const engine = await CreateMLCEngine( @@ -143,7 +147,7 @@ import { MLCEngine } from "@mlc-ai/web-llm"; // This is a synchronous call that returns immediately const engine = new MLCEngine({ - initProgressCallback: initProgressCallback + initProgressCallback: initProgressCallback, }); // This is an asynchronous call and can take a long time to finish @@ -151,16 +155,16 @@ await engine.reload(selectedModel); ``` ### Chat Completion + After successfully initializing the engine, you can now invoke chat completions using OpenAI style chat APIs through the `engine.chat.completions` interface. For the full list of parameters and their descriptions, check [section below](#full-openai-compatibility) and [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create). (Note: The `model` parameter is not supported and will be ignored here. Instead, call `CreateMLCEngine(model)` or `engine.reload(model)` instead as shown in the [Create MLCEngine](#create-mlcengine) above.) - ```typescript const messages = [ { role: "system", content: "You are a helpful AI assistant." }, { role: "user", content: "Hello!" }, -] +]; const reply = await engine.chat.completions.create({ messages, @@ -177,7 +181,7 @@ WebLLM also supports streaming chat completion generating. To use it, simply pas const messages = [ { role: "system", content: "You are a helpful AI assistant." }, { role: "user", content: "Hello!" }, -] +]; // Chunks is an AsyncGenerator object const chunks = await engine.chat.completions.create({ @@ -240,12 +244,9 @@ import { CreateWebWorkerMLCEngine } from "@mlc-ai/web-llm"; async function main() { // Use a WebWorkerMLCEngine instead of MLCEngine here const engine = await CreateWebWorkerMLCEngine( - new Worker( - new URL("./worker.ts", import.meta.url), - { - type: "module", - } - ), + new Worker(new URL("./worker.ts", import.meta.url), { + type: "module", + }), selectedModel, { initProgressCallback }, // engineConfig ); @@ -264,7 +265,6 @@ your application's offline experience. We create a handler in the worker thread that communicates with the frontend while handling the requests. - ```typescript // sw.ts import { ServiceWorkerMLCEngineHandler } from "@mlc-ai/web-llm"; @@ -282,28 +282,32 @@ Then in the main logic, we register the service worker and create the engine usi ```typescript // main.ts -import { MLCEngineInterface, CreateServiceWorkerMLCEngine } from "@mlc-ai/web-llm"; +import { + MLCEngineInterface, + CreateServiceWorkerMLCEngine, +} from "@mlc-ai/web-llm"; if ("serviceWorker" in navigator) { navigator.serviceWorker.register( - new URL("sw.ts", import.meta.url), // worker script + new URL("sw.ts", import.meta.url), // worker script { type: "module" }, ); } -const engine: MLCEngineInterface = - await CreateServiceWorkerMLCEngine( - selectedModel, - { initProgressCallback }, // engineConfig - ); +const engine: MLCEngineInterface = await CreateServiceWorkerMLCEngine( + selectedModel, + { initProgressCallback }, // engineConfig +); ``` You can find a complete example on how to run WebLLM in service worker in [examples/service-worker](examples/service-worker/). ### Chrome Extension + You can also find examples of building Chrome extension with WebLLM in [examples/chrome-extension](examples/chrome-extension/) and [examples/chrome-extension-webgpu-service-worker](examples/chrome-extension-webgpu-service-worker/). The latter one leverages service worker, so the extension is persistent in the background. Additionally, you can explore another full project of a Chrome extension, WebLLM Assistant, which leverages WebLLM [here](https://github.com/mlc-ai/web-llm-assistant). ## Full OpenAI Compatibility + WebLLM is designed to be fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat). Thus, besides building a simple chatbot, you can also have the following functionalities with WebLLM: - [streaming](examples/streaming): return output as chunks in real-time in the form of an AsyncGenerator @@ -313,10 +317,10 @@ WebLLM is designed to be fully compatible with [OpenAI API](https://platform.ope ## Custom Models -WebLLM works as a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm) and it supports custom models in MLC format. +WebLLM works as a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm) and it supports custom models in MLC format. It reuses the model artifact and builds the flow of MLC LLM. To compile and use your own models with WebLLM, please check out [MLC LLM document](https://llm.mlc.ai/docs/deploy/webllm.html) -on how to compile and deploy new model weights and libraries to WebLLM. +on how to compile and deploy new model weights and libraries to WebLLM. Here, we go over the high-level idea. There are two elements of the WebLLM package that enable new models and weight variants. @@ -400,16 +404,17 @@ WebLLM's runtime largely depends on TVMjs: https://github.com/apache/tvm/tree/ma While it is also available as an npm package: https://www.npmjs.com/package/@mlc-ai/web-runtime, you can build it from source if needed by following the steps below. 1. Install [emscripten](https://emscripten.org). It is an LLVM-based compiler that compiles C/C++ source code to WebAssembly. - - Follow the [installation instruction](https://emscripten.org/docs/getting_started/downloads.html#installation-instructions-using-the-emsdk-recommended) to install the latest emsdk. - - Source `emsdk_env.sh` by `source path/to/emsdk_env.sh`, so that `emcc` is reachable from PATH and the command `emcc` works. + - Follow the [installation instruction](https://emscripten.org/docs/getting_started/downloads.html#installation-instructions-using-the-emsdk-recommended) to install the latest emsdk. + - Source `emsdk_env.sh` by `source path/to/emsdk_env.sh`, so that `emcc` is reachable from PATH and the command `emcc` works. + + We can verify the successful installation by trying out `emcc` terminal. - We can verify the successful installation by trying out `emcc` terminal. + Note: We recently found that using the latest `emcc` version may run into issues during runtime. Use `./emsdk install 3.1.56` instead of `./emsdk install latest` for now as a workaround. The error may look like - Note: We recently found that using the latest `emcc` version may run into issues during runtime. Use `./emsdk install 3.1.56` instead of `./emsdk install latest` for now as a workaround. The error may look like - ``` - Init error, LinkError: WebAssembly.instantiate(): Import #6 module="wasi_snapshot_preview1" - function="proc_exit": function import requires a callable - ``` + ``` + Init error, LinkError: WebAssembly.instantiate(): Import #6 module="wasi_snapshot_preview1" + function="proc_exit": function import requires a callable + ``` 2. In `./package.json`, change from `"@mlc-ai/web-runtime": "0.18.0-dev2",` to `"@mlc-ai/web-runtime": "file:./tvm_home/web",`. @@ -422,6 +427,7 @@ While it is also available as an npm package: https://www.npmjs.com/package/@mlc ``` In this step, if `$TVM_SOURCE_DIR` is not defined in the environment, we will execute the following line to build `tvmjs` dependency: + ```shell git clone https://github.com/mlc-ai/relax 3rdparty/tvm-unity --recursive ``` @@ -456,17 +462,18 @@ This project is initiated by members from CMU Catalyst, UW SAMPL, SJTU, OctoML, This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities make these models accessible. We would like to thank the teams behind Vicuna, SentencePiece, LLaMA, and Alpaca. We also would like to thank the WebAssembly, Emscripten, and WebGPU communities. Finally, thanks to Dawn and WebGPU developers. ## Citation + If you find this project to be useful, please cite: ``` @misc{ruan2024webllmhighperformanceinbrowserllm, - title={WebLLM: A High-Performance In-Browser LLM Inference Engine}, + title={WebLLM: A High-Performance In-Browser LLM Inference Engine}, author={Charlie F. Ruan and Yucheng Qin and Xun Zhou and Ruihang Lai and Hongyi Jin and Yixin Dong and Bohan Hou and Meng-Shiun Yu and Yiyan Zhai and Sudeep Agarwal and Hangrui Cao and Siyuan Feng and Tianqi Chen}, year={2024}, eprint={2412.15803}, archivePrefix={arXiv}, primaryClass={cs.LG}, - url={https://arxiv.org/abs/2412.15803}, + url={https://arxiv.org/abs/2412.15803}, } ``` diff --git a/generate-models-json.mjs b/generate-models-json.mjs new file mode 100644 index 000000000..9cb092f0d --- /dev/null +++ b/generate-models-json.mjs @@ -0,0 +1,196 @@ +/** + * Generates models list in json,, + * Run from web-llm repo root (no build needed — reads src/config.ts directly): + * node generate-models-json.mjs + * + * Outputs: site/models-data.json + */ + +import { readFileSync, writeFileSync } from "fs"; + +const src = readFileSync("./src/config.ts", "utf8"); + +const listStart = src.indexOf("model_list: ["); +const listEnd = src.indexOf("\n ],\n};"); +const listSrc = src.slice(listStart, listEnd); + +const entries = []; +const modelListContent = listSrc.substring(listSrc.indexOf("[") + 1); + +let braceLevel = 0; +let currentBlockStart = -1; +const modelBlocks = []; + +for (let i = 0; i < modelListContent.length; i++) { + if (modelListContent[i] === "{") { + if (braceLevel === 0) currentBlockStart = i; + braceLevel++; + } else if (modelListContent[i] === "}") { + braceLevel--; + if (braceLevel === 0 && currentBlockStart !== -1) { + modelBlocks.push(modelListContent.slice(currentBlockStart, i + 1)); + currentBlockStart = -1; + } + } +} + +for (const block of modelBlocks) { + const modelMatch = block.match(/model:\s*"([^"]+)"/); + const modelIdMatch = block.match(/model_id:\s*"([^"]+)"/); + if (!modelMatch || !modelIdMatch) continue; + + const hf_url = modelMatch[1]; + const model_id = modelIdMatch[1]; + + const vramMatch = block.match(/vram_required_MB:\s*([\d.]+)/); + const lowMatch = block.match(/low_resource_required:\s*(true|false)/); + const ctxMatch = block.match(/context_window_size:\s*(\d+)/); + const typeMatch = block.match(/model_type:\s*ModelType\.(\w+)/); + const featMatch = block.match(/required_features:\s*\[([^\]]+)\]/); + + const vramMB = vramMatch ? parseFloat(vramMatch[1]) : null; + const ctx = ctxMatch ? parseInt(ctxMatch[1]) : null; + + entries.push({ + model_id, + hf_url, + vram: vramMB + ? vramMB >= 1024 + ? `${(vramMB / 1024).toFixed(1)} GB` + : `${Math.round(vramMB)} MB` + : null, + low_resource: lowMatch ? lowMatch[1] === "true" : false, + context_window: ctx + ? `${ctx >= 1024 ? (ctx / 1024).toFixed(0) : ctx}K` + : null, + model_type: typeMatch + ? typeMatch[1] === "embedding" + ? "embedding" + : typeMatch[1] === "VLM" + ? "vlm" + : "llm" + : "llm", + required_features: featMatch + ? featMatch[1] + .split(",") + .map((s) => s.trim().replace(/"/g, "")) + .filter(Boolean) + : [], + }); +} + +console.log(`Parsed ${entries.length} model entries from src/config.ts`); + +const FAMILIES = [ + { + id: "deepseek", + label: "DeepSeek", + emoji: "🧠", + match: (id) => id.startsWith("DeepSeek"), + }, + { + id: "llama3", + label: "Llama 3", + emoji: "🦙", + match: (id) => /^(Llama-3|Hermes-[23]|Hermes-2-Theta)/.test(id), + }, + { + id: "llama2", + label: "Llama 2", + emoji: "🦙", + match: (id) => id.startsWith("Llama-2"), + }, + { + id: "qwen3", + label: "Qwen3", + emoji: "🌊", + match: (id) => id.startsWith("Qwen3"), + }, + { + id: "qwen25", + label: "Qwen 2.5", + emoji: "🌊", + match: (id) => id.startsWith("Qwen2.5"), + }, + { + id: "qwen2", + label: "Qwen 2", + emoji: "🌊", + match: (id) => id.startsWith("Qwen2-") || id.startsWith("Qwen2."), + }, + { id: "phi", label: "Phi", emoji: "🔷", match: (id) => /^[Pp]hi/.test(id) }, + { + id: "gemma", + label: "Gemma", + emoji: "💎", + match: (id) => id.startsWith("gemma"), + }, + { + id: "mistral", + label: "Mistral & Friends", + emoji: "💨", + match: (id) => + ["Mistral", "OpenHermes", "NeuralHermes", "WizardMath", "Ministral"].some( + (p) => id.startsWith(p), + ), + }, + { + id: "smollm", + label: "SmolLM2", + emoji: "🔬", + match: (id) => id.startsWith("SmolLM"), + }, + { + id: "stablelm", + label: "StableLM", + emoji: "🔩", + match: (id) => id.startsWith("stablelm"), + }, + { + id: "tinyllama", + label: "TinyLlama", + emoji: "🐣", + match: (id) => id.startsWith("TinyLlama"), + }, + { + id: "redpajama", + label: "RedPajama", + emoji: "🟥", + match: (id) => id.startsWith("RedPajama"), + }, + { + id: "embedding", + label: "Embedding", + emoji: "📐", + match: (id) => id.startsWith("snowflake"), + }, +]; + +const familyMap = new Map(FAMILIES.map((f) => [f.id, { ...f, models: [] }])); +const other = { id: "other", label: "Other", emoji: "📦", models: [] }; + +for (const entry of entries) { + const family = FAMILIES.find((f) => f.match(entry.model_id)); + if (family) { + familyMap.get(family.id).models.push(entry); + } else { + other.models.push(entry); + console.warn(` ⚠️ No family matched: ${entry.model_id}`); + } +} + +const families = [ + ...familyMap.values(), + ...(other.models.length ? [other] : []), +].filter((f) => f.models.length > 0); + +const output = { + generated_at: new Date().toISOString(), + total_models: entries.length, + families, +}; + +writeFileSync("site/models-data.json", JSON.stringify(output, null, 2)); +console.log( + `✅ Wrote ${entries.length} models in ${families.length} families → site/models-data.json`, +); diff --git a/site/models-data.json b/site/models-data.json new file mode 100644 index 000000000..34baebb52 --- /dev/null +++ b/site/models-data.json @@ -0,0 +1,1467 @@ +{ + "generated_at": "2026-03-12T15:12:33.561Z", + "total_models": 145, + "families": [ + { + "id": "deepseek", + "label": "DeepSeek", + "emoji": "🧠", + "models": [ + { + "model_id": "DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "DeepSeek-R1-Distill-Qwen-1.5B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC", + "vram": "5.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "DeepSeek-R1-Distill-Qwen-7B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-7B-q4f32_1-MLC", + "vram": "5.8 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Llama-8B-q4f32_1-MLC", + "vram": "6.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC", + "vram": "4.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "llama3", + "label": "Llama 3", + "emoji": "🦙", + "models": [ + { + "model_id": "Llama-3.2-1B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f32_1-MLC", + "vram": "1.1 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.2-1B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q4f16_1-MLC", + "vram": "879 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.2-1B-Instruct-q0f32-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q0f32-MLC", + "vram": "5.0 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.2-1B-Instruct-q0f16-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.2-1B-Instruct-q0f16-MLC", + "vram": "2.5 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.2-3B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.2-3B-Instruct-q4f32_1-MLC", + "vram": "2.9 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.2-3B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.2-3B-Instruct-q4f16_1-MLC", + "vram": "2.2 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.1-8B-Instruct-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.1-8B-Instruct-q4f32_1-MLC", + "vram": "5.2 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.1-8B-Instruct-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.1-8B-Instruct-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.1-8B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.1-8B-Instruct-q4f32_1-MLC", + "vram": "6.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3.1-8B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.1-8B-Instruct-q4f16_1-MLC", + "vram": "4.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-2-Theta-Llama-3-8B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-2-Theta-Llama-3-8B-q4f16_1-MLC", + "vram": "4.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-2-Theta-Llama-3-8B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-2-Theta-Llama-3-8B-q4f32_1-MLC", + "vram": "5.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-2-Pro-Llama-3-8B-q4f16_1-MLC", + "vram": "4.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-2-Pro-Llama-3-8B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-2-Pro-Llama-3-8B-q4f32_1-MLC", + "vram": "5.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-3-Llama-3.2-3B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-3-Llama-3.2-3B-q4f32_1-MLC", + "vram": "2.9 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-3-Llama-3.2-3B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-3-Llama-3.2-3B-q4f16_1-MLC", + "vram": "2.2 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-3-Llama-3.1-8B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-3-Llama-3.1-8B-q4f32_1-MLC", + "vram": "5.6 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-3-Llama-3.1-8B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-3-Llama-3.1-8B-q4f16_1-MLC", + "vram": "4.8 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Hermes-2-Pro-Mistral-7B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Hermes-2-Pro-Mistral-7B-q4f16_1-MLC", + "vram": "3.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "Llama-3.1-70B-Instruct-q3f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3.1-70B-Instruct-q3f16_1-MLC", + "vram": "30.4 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3-8B-Instruct-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC", + "vram": "5.2 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3-8B-Instruct-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3-8B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f32_1-MLC", + "vram": "6.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3-8B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC", + "vram": "4.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-3-70B-Instruct-q3f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-3-70B-Instruct-q3f16_1-MLC", + "vram": "30.4 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "llama2", + "label": "Llama 2", + "emoji": "🦙", + "models": [ + { + "model_id": "Llama-2-7b-chat-hf-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC", + "vram": "5.2 GB", + "low_resource": false, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-2-7b-chat-hf-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": false, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "Llama-2-7b-chat-hf-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f32_1-MLC", + "vram": "8.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Llama-2-7b-chat-hf-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC", + "vram": "6.6 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "Llama-2-13b-chat-hf-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Llama-2-13b-chat-hf-q4f16_1-MLC", + "vram": "11.5 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + } + ] + }, + { + "id": "qwen3", + "label": "Qwen3", + "emoji": "🌊", + "models": [ + { + "model_id": "Qwen3-0.6B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-0.6B-q4f16_1-MLC", + "vram": "1.4 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-0.6B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-0.6B-q4f32_1-MLC", + "vram": "1.9 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-0.6B-q0f16-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-0.6B-q0f16-MLC", + "vram": "2.2 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-0.6B-q0f32-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-0.6B-q0f32-MLC", + "vram": "3.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-1.7B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-1.7B-q4f16_1-MLC", + "vram": "2.0 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-1.7B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-1.7B-q4f32_1-MLC", + "vram": "2.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-4B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-4B-q4f16_1-MLC", + "vram": "3.4 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-4B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-4B-q4f32_1-MLC", + "vram": "4.2 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-8B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-8B-q4f16_1-MLC", + "vram": "5.6 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen3-8B-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen3-8B-q4f32_1-MLC", + "vram": "6.7 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "qwen25", + "label": "Qwen 2.5", + "emoji": "🌊", + "models": [ + { + "model_id": "Qwen2.5-0.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-0.5B-Instruct-q4f16_1-MLC", + "vram": "945 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-0.5B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-0.5B-Instruct-q4f32_1-MLC", + "vram": "1.0 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-0.5B-Instruct-q0f16-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-0.5B-Instruct-q0f16-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-0.5B-Instruct-q0f32-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-0.5B-Instruct-q0f32-MLC", + "vram": "2.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-1.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-1.5B-Instruct-q4f16_1-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-1.5B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-1.5B-Instruct-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-3B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-3B-Instruct-q4f16_1-MLC", + "vram": "2.4 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-3B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-3B-Instruct-q4f32_1-MLC", + "vram": "2.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-7B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-7B-Instruct-q4f16_1-MLC", + "vram": "5.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-7B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-7B-Instruct-q4f32_1-MLC", + "vram": "5.8 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-0.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-0.5B-Instruct-q4f16_1-MLC", + "vram": "945 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-0.5B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-0.5B-Instruct-q4f32_1-MLC", + "vram": "1.0 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-0.5B-Instruct-q0f16-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-0.5B-Instruct-q0f16-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-1.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-1.5B-Instruct-q4f16_1-MLC", + "vram": "1.6 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-1.5B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-1.5B-Instruct-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-3B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-3B-Instruct-q4f16_1-MLC", + "vram": "2.4 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-3B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-3B-Instruct-q4f32_1-MLC", + "vram": "2.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-7B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-7B-Instruct-q4f16_1-MLC", + "vram": "5.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Coder-7B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Coder-7B-Instruct-q4f32_1-MLC", + "vram": "5.8 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Math-1.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Math-1.5B-Instruct-q4f16_1-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2.5-Math-1.5B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2.5-Math-1.5B-Instruct-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "qwen2", + "label": "Qwen 2", + "emoji": "🌊", + "models": [ + { + "model_id": "Qwen2-0.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-0.5B-Instruct-q4f16_1-MLC", + "vram": "945 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-0.5B-Instruct-q0f16-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-0.5B-Instruct-q0f16-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-0.5B-Instruct-q0f32-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-0.5B-Instruct-q0f32-MLC", + "vram": "2.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-1.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-1.5B-Instruct-q4f16_1-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-1.5B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-1.5B-Instruct-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-7B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-7B-Instruct-q4f16_1-MLC", + "vram": "5.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-7B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-7B-Instruct-q4f32_1-MLC", + "vram": "5.8 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-Math-1.5B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-Math-1.5B-Instruct-q4f16_1-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-Math-1.5B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-Math-1.5B-Instruct-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-Math-7B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-Math-7B-Instruct-q4f16_1-MLC", + "vram": "5.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Qwen2-Math-7B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Qwen2-Math-7B-Instruct-q4f32_1-MLC", + "vram": "5.8 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "phi", + "label": "Phi", + "emoji": "🔷", + "models": [ + { + "model_id": "Phi-3.5-mini-instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3.5-mini-instruct-q4f16_1-MLC", + "vram": "3.6 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Phi-3.5-mini-instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3.5-mini-instruct-q4f32_1-MLC", + "vram": "5.4 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Phi-3.5-mini-instruct-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3.5-mini-instruct-q4f16_1-MLC", + "vram": "2.5 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Phi-3.5-mini-instruct-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3.5-mini-instruct-q4f32_1-MLC", + "vram": "3.1 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Phi-3.5-vision-instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3.5-vision-instruct-q4f16_1-MLC", + "vram": "3.9 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "vlm", + "required_features": [] + }, + { + "model_id": "Phi-3.5-vision-instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3.5-vision-instruct-q4f32_1-MLC", + "vram": "5.7 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "vlm", + "required_features": [] + }, + { + "model_id": "Phi-3-mini-4k-instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3-mini-4k-instruct-q4f16_1-MLC", + "vram": "3.6 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Phi-3-mini-4k-instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3-mini-4k-instruct-q4f32_1-MLC", + "vram": "5.4 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Phi-3-mini-4k-instruct-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3-mini-4k-instruct-q4f16_1-MLC", + "vram": "2.5 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Phi-3-mini-4k-instruct-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/Phi-3-mini-4k-instruct-q4f32_1-MLC", + "vram": "3.1 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "phi-2-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/phi-2-q4f16_1-MLC", + "vram": "3.0 GB", + "low_resource": false, + "context_window": "2K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "phi-2-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/phi-2-q4f32_1-MLC", + "vram": "3.9 GB", + "low_resource": false, + "context_window": "2K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "phi-2-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/phi-2-q4f16_1-MLC", + "vram": "2.1 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "phi-2-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/phi-2-q4f32_1-MLC", + "vram": "2.7 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "phi-1_5-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/phi-1_5-q4f16_1-MLC", + "vram": "1.2 GB", + "low_resource": true, + "context_window": "2K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "phi-1_5-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/phi-1_5-q4f32_1-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "2K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "phi-1_5-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/phi-1_5-q4f16_1-MLC", + "vram": "1.2 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "phi-1_5-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/phi-1_5-q4f32_1-MLC", + "vram": "1.6 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "gemma", + "label": "Gemma", + "emoji": "💎", + "models": [ + { + "model_id": "gemma-2-2b-it-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-2b-it-q4f16_1-MLC", + "vram": "1.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "gemma-2-2b-it-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-2b-it-q4f32_1-MLC", + "vram": "2.4 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "gemma-2-2b-it-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-2b-it-q4f16_1-MLC", + "vram": "1.5 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "gemma-2-2b-it-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-2b-it-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "gemma-2-9b-it-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-9b-it-q4f16_1-MLC", + "vram": "6.3 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "gemma-2-9b-it-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-9b-it-q4f32_1-MLC", + "vram": "8.2 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "gemma-2-2b-jpn-it-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-2b-jpn-it-q4f16_1-MLC", + "vram": "1.9 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "gemma-2-2b-jpn-it-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2-2b-jpn-it-q4f32_1-MLC", + "vram": "2.4 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "gemma-2b-it-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC", + "vram": "1.4 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "gemma-2b-it-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2b-it-q4f32_1-MLC", + "vram": "1.7 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "gemma-2b-it-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC", + "vram": "1.4 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "gemma-2b-it-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/gemma-2b-it-q4f32_1-MLC", + "vram": "1.7 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "mistral", + "label": "Mistral & Friends", + "emoji": "💨", + "models": [ + { + "model_id": "Mistral-7B-Instruct-v0.3-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.3-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "Mistral-7B-Instruct-v0.3-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.3-q4f32_1-MLC", + "vram": "5.5 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Mistral-7B-Instruct-v0.2-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "OpenHermes-2.5-Mistral-7B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/OpenHermes-2.5-Mistral-7B-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "NeuralHermes-2.5-Mistral-7B-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/NeuralHermes-2.5-Mistral-7B-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "WizardMath-7B-V1.1-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/WizardMath-7B-V1.1-q4f16_1-MLC", + "vram": "4.5 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "Ministral-3-3B-Base-2512-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Ministral-3-3B-Base-2512-q4f16_1-MLC", + "vram": null, + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Ministral-3-3B-Reasoning-2512-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Ministral-3-3B-Reasoning-2512-q4f16_1-MLC", + "vram": null, + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "Ministral-3-3B-Instruct-2512-BF16-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/Ministral-3-3B-Instruct-2512-BF16-q4f16_1-MLC", + "vram": null, + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "smollm", + "label": "SmolLM2", + "emoji": "🔬", + "models": [ + { + "model_id": "SmolLM2-1.7B-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-1.7B-Instruct-q4f16_1-MLC", + "vram": "1.7 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "SmolLM2-1.7B-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-1.7B-Instruct-q4f32_1-MLC", + "vram": "2.6 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "SmolLM2-360M-Instruct-q0f16-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-360M-Instruct-q0f16-MLC", + "vram": "872 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "SmolLM2-360M-Instruct-q0f32-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-360M-Instruct-q0f32-MLC", + "vram": "1.7 GB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "SmolLM2-360M-Instruct-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-360M-Instruct-q4f16_1-MLC", + "vram": "376 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "SmolLM2-360M-Instruct-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-360M-Instruct-q4f32_1-MLC", + "vram": "580 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "SmolLM2-135M-Instruct-q0f16-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-135M-Instruct-q0f16-MLC", + "vram": "360 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "SmolLM2-135M-Instruct-q0f32-MLC", + "hf_url": "https://huggingface.co/mlc-ai/SmolLM2-135M-Instruct-q0f32-MLC", + "vram": "719 MB", + "low_resource": true, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "stablelm", + "label": "StableLM", + "emoji": "🔩", + "models": [ + { + "model_id": "stablelm-2-zephyr-1_6b-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC", + "vram": "2.0 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "stablelm-2-zephyr-1_6b-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f32_1-MLC", + "vram": "2.9 GB", + "low_resource": false, + "context_window": "4K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "stablelm-2-zephyr-1_6b-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f16_1-MLC", + "vram": "1.5 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "stablelm-2-zephyr-1_6b-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/stablelm-2-zephyr-1_6b-q4f32_1-MLC", + "vram": "1.8 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "tinyllama", + "label": "TinyLlama", + "emoji": "🐣", + "models": [ + { + "model_id": "TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC", + "vram": "697 MB", + "low_resource": true, + "context_window": "2K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC", + "vram": "840 MB", + "low_resource": true, + "context_window": "2K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v1.0-q4f16_1-MLC", + "vram": "675 MB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v1.0-q4f32_1-MLC", + "vram": "796 MB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC", + "vram": "697 MB", + "low_resource": true, + "context_window": "2K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC", + "vram": "840 MB", + "low_resource": true, + "context_window": "2K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC", + "vram": "675 MB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v0.4-q4f32_1-MLC", + "vram": "796 MB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "redpajama", + "label": "RedPajama", + "emoji": "🟥", + "models": [ + { + "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC", + "vram": "2.9 GB", + "low_resource": false, + "context_window": "2K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC", + "hf_url": "https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC", + "vram": "3.8 GB", + "low_resource": false, + "context_window": "2K", + "model_type": "llm", + "required_features": [] + }, + { + "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC", + "vram": "2.0 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [ + "shader-f16" + ] + }, + { + "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC-1k", + "hf_url": "https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f32_1-MLC", + "vram": "2.5 GB", + "low_resource": true, + "context_window": "1K", + "model_type": "llm", + "required_features": [] + } + ] + }, + { + "id": "embedding", + "label": "Embedding", + "emoji": "📐", + "models": [ + { + "model_id": "snowflake-arctic-embed-m-q0f32-MLC-b32", + "hf_url": "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC", + "vram": "1.4 GB", + "low_resource": false, + "context_window": null, + "model_type": "embedding", + "required_features": [] + }, + { + "model_id": "snowflake-arctic-embed-m-q0f32-MLC-b4", + "hf_url": "https://huggingface.co/mlc-ai/snowflake-arctic-embed-m-q0f32-MLC", + "vram": "539 MB", + "low_resource": false, + "context_window": null, + "model_type": "embedding", + "required_features": [] + }, + { + "model_id": "snowflake-arctic-embed-s-q0f32-MLC-b32", + "hf_url": "https://huggingface.co/mlc-ai/snowflake-arctic-embed-s-q0f32-MLC", + "vram": "1023 MB", + "low_resource": false, + "context_window": null, + "model_type": "embedding", + "required_features": [] + }, + { + "model_id": "snowflake-arctic-embed-s-q0f32-MLC-b4", + "hf_url": "https://huggingface.co/mlc-ai/snowflake-arctic-embed-s-q0f32-MLC", + "vram": "239 MB", + "low_resource": false, + "context_window": null, + "model_type": "embedding", + "required_features": [] + } + ] + } + ] +} \ No newline at end of file diff --git a/site/models.html b/site/models.html new file mode 100644 index 000000000..4a760bef2 --- /dev/null +++ b/site/models.html @@ -0,0 +1,505 @@ +--- +layout: default +title: Built-in Models +notitle: true +--- + + + +
+

Built-in Models

+

+ All models supported by WebLLM via + prebuiltAppConfig.model_list. This page is generated directly + from the source — always up to date. Need something not listed? + Open an issue + or see + Custom Models. +

+ +
Loading model list…
+ + + +
+

Don't see what you need?

+ Request a Model + Custom Models → +
+
+ +