Skip to content

Commit 7d4fd3b

Browse files
authored
Update README.md for web example (microsoft#429)
1 parent 86553e8 commit 7d4fd3b

File tree

1 file changed

+6
-8
lines changed

1 file changed

+6
-8
lines changed

js/chat/README.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# Local Chat using Phi3, ONNX Runtime Web and WebGPU
1+
# Local Chatbot in the browser using Phi3, ONNX Runtime Web and WebGPU
22

33
This repository contains an example of running [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) in your browser using [ONNX Runtime Web](https://github.com/microsoft/onnxruntime) with WebGPU.
44

55
You can try out the live demo [here](https://guschmue.github.io/ort-webgpu/chat/index.html).
66

7-
We keep this example simple and use the onnxruntime-web api directly without a
8-
higher level framework like [transformers.js](https://github.com/xenova/transformers.js).
7+
We keep this example simple and use the onnxruntime-web api directly. ONNX Runtime Web has been powering
8+
higher level frameworks like [transformers.js](https://github.com/xenova/transformers.js).
99

1010
## Getting Started
1111

@@ -42,13 +42,11 @@ Point your browser to http://localhost:8080/.
4242

4343
### The Phi3 ONNX Model
4444

45-
The model used in this example is hosted on [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx-web). It is slightly different than the ONNX model for CUDA or CPU:
45+
The model used in this example is hosted on [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx-web). It is an optimized ONNX version specific to Web and slightly different than the ONNX model for CUDA or CPU:
4646
1. The model output 'logits' is kept as float32 (even for float16 models) since Javascript does not support float16.
4747
2. Our WebGPU implementation uses the custom Multiheaded Attention operator instread of Group Query Attention.
4848
3. Phi3 is larger then 2GB and we need to use external data files. To keep them cacheable in the browser,
4949
both model.onnx and model.onnx.data are kept under 2GB.
5050

51-
The model was created using the [ONNX genai model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models).
52-
53-
If you like to create the model yourself, you can use [Olive](https://github.com/microsoft/Olive/).
54-
An example how to create the model for ONNX Runtime Web with Olive can be found [here](https://github.com/microsoft/Olive/tree/main/examples/phi3).
51+
If you like to optimize your fine-tuned pytorch Phi-3-min model, you can use [Olive](https://github.com/microsoft/Olive/) which supports float data type conversion and [ONNX genai model builder toolkit](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models).
52+
An example how to optimize Phi-3-min model for ONNX Runtime Web with Olive can be found [here](https://github.com/microsoft/Olive/tree/main/examples/phi3).

0 commit comments

Comments
 (0)