Skip to content

Commit ace6160

Browse files
authored
More cleanups (#114)
1 parent 8ce1d8d commit ace6160

File tree

1 file changed

+32
-59
lines changed

1 file changed

+32
-59
lines changed

README.md

Lines changed: 32 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,20 @@
44
| [NPM Package](https://www.npmjs.com/package/@mlc-ai/web-llm) | [Get Started](#get-started) | [MLC LLM](https://github.com/mlc-ai/mlc-llm) | [Discord][discord-url]
55

66
WebLLM is a modular, customizable javascript package that directly
7-
bring language model chats directly onto web browsers with hardware acceleration.
7+
brings language model chats directly onto web browsers with hardware acceleration.
88
**Everything runs inside the browser with no server support and accelerated with WebGPU.**
99
We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.
1010

1111
**[Check out our demo webpage to try out!](https://mlc.ai/web-llm/)**
1212
This project is a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm),
13-
our companion project that runs LLMs natively on iphone and other native local environments.
13+
our companion project that runs LLMs natively on iPhone and other native local environments.
1414

1515

1616
<img src="site/img/fig/demo.gif">
1717

1818
## Get Started
1919

20-
WebLLM offers a minimalist and modular interface to access the chatbot in browser.
20+
WebLLM offers a minimalist and modular interface to access the chatbot in the browser.
2121
The following code demonstrates the basic usage.
2222

2323
```typescript
@@ -35,9 +35,9 @@ async function main() {
3535
```
3636

3737
The WebLLM package itself does not come with UI, and is designed in a
38-
modular way to hooked to any of the UI component. The following code snippet
39-
is contains part of the program that generate streaming response on a webpage.
40-
You can checkout [examples/get-started](examples/get-started/) to see the complete example.
38+
modular way to hook to any of the UI components. The following code snippet
39+
contains part of the program that generates a streaming response on a webpage.
40+
You can check out [examples/get-started](examples/get-started/) to see the complete example.
4141

4242
```typescript
4343
async function main() {
@@ -47,7 +47,7 @@ async function main() {
4747
chat.setInitProgressCallback((report: InitProgressReport) => {
4848
setLabel("init-label", report.text);
4949
});
50-
// pick a model, here we use red-pajama
50+
// pick a model. Here we use red-pajama
5151
const localId = "RedPajama-INCITE-Chat-3B-v1-q4f32_0";
5252
await chat.reload(localId);
5353

@@ -64,40 +64,39 @@ async function main() {
6464
const reply1 = await chat.generate(prompt1, generateProgressCallback)
6565
console.log(reply1);
6666

67-
// We can print out the statis
67+
// We can print out the status
6868
console.log(await chat.runtimeStatsText());
6969
}
7070
```
7171

7272
Finally, you can find a complete
73-
You can also find a complete chat-app in [examples/simple-chat](examples/simple-chat/).
73+
You can also find a complete chat app in [examples/simple-chat](examples/simple-chat/).
7474

7575
## Customized Model Weights
7676

77-
WebLLM works a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm).
78-
It reuses the model artifact and build flow of MLC LLM, please checkout MLC LLM document
79-
on how to build a new model weights and libraries (MLC LLM document will come in the incoming weeks).
77+
WebLLM works as a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm).
78+
It reuses the model artifact and builds flow of MLC LLM, please check out MLC LLM document
79+
on how to build new model weights and libraries (MLC LLM document will come in the incoming weeks).
8080
To generate the wasm needed by WebLLM, you can run with `--target webgpu` in the mlc llm build.
81-
There are two elements of WebLLM package that enables new models and weight variants.
81+
There are two elements of the WebLLM package that enables new models and weight variants.
8282

8383
- model_url: Contains a URL to model artifacts, such as weights and meta-data.
84-
- model_lib: The webassembly libary that contains the executables to accelerate the model computations.
84+
- model_lib: The web assembly libary that contains the executables to accelerate the model computations.
8585

8686
Both are customizable in the WebLLM.
8787

8888
```typescript
8989
async main() {
9090
const myLlamaUrl = "/url/to/my/llama";
9191
const appConfig = {
92-
"model_list": [
93-
{
94-
"model_url": myLlamaUrl,
95-
"local_id": "MyLlama-3b-v1-q4f32_0"
96-
}
97-
],
98-
"model_lib_map": {
99-
"llama-v1-3b-q4f32_0": "/url/to/myllama3b.wasm",
100-
}
92+
"model_list": [
93+
{
94+
"model_url": myLlamaUrl,
95+
"local_id": "MyLlama-3b-v1-q4f32_0"
96+
}
97+
],
98+
"model_lib_map": {
99+
"llama-v1-3b-q4f32_0": "/url/to/myllama3b.wasm",
101100
};
102101
// override default
103102
const chatOpts = {
@@ -117,10 +116,10 @@ async main() {
117116
}
118117
```
119118
120-
In many cases we only want to supply the model weight variant, but
119+
In many cases, we only want to supply the model weight variant, but
121120
not necessarily a new model. In such cases, we can reuse the model lib.
122121
In such cases, we can just pass in the `model_list` field and skip the model lib,
123-
and make sure the `mlc-chat-config.json` in the model url have a model lib
122+
and make sure the `mlc-chat-config.json` in the model url has a model lib
124123
that points to a prebuilt version, right now the prebuilt lib includes
125124
126125
- `vicuna-v1-7b-q4f32_0`: llama-7b models.
@@ -131,16 +130,16 @@ that points to a prebuilt version, right now the prebuilt lib includes
131130
132131
WebLLM package is a web runtime designed for [MLC LLM](https://github.com/mlc-ai/mlc-llm).
133132
134-
1. Install all the prerequisite for web deployment:
135-
1. [emscripten](https://emscripten.org). It is an LLVM-based compiler which compiles C/C++ source code to WebAssembly.
133+
1. Install all the prerequisites for compilation:
134+
1. [emscripten](https://emscripten.org). It is an LLVM-based compiler that compiles C/C++ source code to WebAssembly.
136135
- Follow the [installation instruction](https://emscripten.org/docs/getting_started/downloads.html#installation-instructions-using-the-emsdk-recommended) to install the latest emsdk.
137136
- Source `emsdk_env.sh` by `source path/to/emsdk_env.sh`, so that `emcc` is reachable from PATH and the command `emcc` works.
138137
4. Install jekyll by following the [official guides](https://jekyllrb.com/docs/installation/). It is the package we use for website.
139138
5. Install jekyll-remote-theme by command. Try [gem mirror](https://gems.ruby-china.com/) if install blocked.
140139
```shell
141140
gem install jekyll-remote-theme
142141
```
143-
We can verify the success installation by trying out `emcc` and `jekyll` in terminal respectively.
142+
We can verify the successful installation by trying out `emcc` and `jekyll` in terminal, respectively.
144143
145144
2. Setup necessary environment
146145
@@ -155,40 +154,14 @@ WebLLM package is a web runtime designed for [MLC LLM](https://github.com/mlc-ai
155154
npm run build
156155
```
157156
158-
4. Validate some of the sub packages
157+
4. Validate some of the sub-packages
159158
160-
You can then go to the subfolders in [examples] to validate some of the sub packages.
161-
We use Parcelv2 for bundling. Although parcel is not very good at tracking parent directory
162-
changes sometimes. When you made a change in the WebLLM package, try to edit the `package.json`
159+
You can then go to the subfolders in [examples] to validate some of the sub-packages.
160+
We use Parcelv2 for bundling. Although Parcel is not very good at tracking parent directory
161+
changes sometimes. When you make a change in the WebLLM package, try to edit the `package.json`
163162
of the subfolder and save it, which will trigger Parcel to rebuild.
164163
165164
166-
## How
167-
168-
The key technology here is machine learning compilation (MLC). Our solution builds on the shoulders of the open source ecosystem, including Hugging Face, model variants from LLaMA and Vicuna, wasm and WebGPU. The main flow builds on Apache TVM Unity, an exciting ongoing development in the [Apache TVM Community](https://github.com/apache/tvm/).
169-
170-
- We bake a language model's IRModule in TVM with native dynamic shape support, avoiding the need of padding to max length and reducing both computation amount and memory usage.
171-
- Each function in TVM’s IRModule can be further transformed and generate runnable code that can be deployed universally on any environment that is supported by minimum tvm runtime (JavaScript being one of them).
172-
- [TensorIR](https://arxiv.org/abs/2207.04296) is the key technique used to generate optimized programs. We provide productive solutions by quickly transforming TensorIR programs based on the combination of expert knowledge and automated scheduler.
173-
- Heuristics are used when optimizing light-weight operators in order to reduce the engineering pressure.
174-
- We utilize int4 quantization techniques to compress the model weights so that they can fit into memory.
175-
- We build static memory planning optimizations to reuse memory across multiple layers.
176-
- We use [Emscripten](https://emscripten.org/) and TypeScript to build a TVM web runtime that can deploy generated modules.
177-
- We also leveraged a wasm port of SentencePiece tokenizer.
178-
179-
<img src="site/img/fig/web-llm.svg" alt="web-llm" />
180-
181-
All parts of this workflow are done in Python, with the exception of course, of the last part that builds a 600 loc JavaScript app that connects things together. This is also a fun process of interactive development, bringing new models.
182-
183-
All these are made possible by the open-source ecosystem that we leverage. Specifically, we make heavy use of [TVM unity](https://discuss.tvm.apache.org/t/establish-tvm-unity-connection-a-technical-strategy/13344), an exciting latest development in the TVM project that enables such Python-first interactive MLC development experiences that allows us to easily compose new optimizations, all in Python, and incrementally bring our app to the web.
184-
185-
TVM unity also provides an easy way to compose new solutions in the ecosystem. We will continue to bring further optimizations such as fused quantization kernels, and bring them to more platforms.
186-
187-
One key characteristic of LLM models is the dynamic nature of the model. As the decoding and prefill process depends on computations that grow with the size of tokens, we leverage the first-class dynamic shape support in TVM unity that represents sequence dimensions through symbolic integers. This allows us to plan ahead to statically allocate all the memory needed for the sequence window of interest without padding.
188-
189-
We also leveraged the integration of tensor expressions to quickly express partial-tensor computations such as rotary embedding directly without materializing them into full-tensor matrix computations.
190-
191-
192165
## Links
193166
194167
- [Demo page](https://mlc.ai/web-llm/)
@@ -199,4 +172,4 @@ We also leveraged the integration of tensor expressions to quickly express parti
199172
200173
This project is initiated by members from CMU catalyst, UW SAMPL, SJTU, OctoML and the MLC community. We would love to continue developing and supporting the open-source ML community.
201174
202-
This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities that make these models accessible. We would like to thank the teams behind vicuna, SentencePiece, LLaMA, Alpaca. We also would like to thank the WebAssembly, Emscripten, and WebGPU communities. Finally, thanks to Dawn and WebGPU developers.
175+
This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities make these models accessible. We would like to thank the teams behind vicuna, SentencePiece, LLaMA, Alpaca. We also would like to thank the WebAssembly, Emscripten, and WebGPU communities. Finally, thanks to Dawn and WebGPU developers.

0 commit comments

Comments
 (0)