|
1 | | -# Local Chat using Phi3, ONNX Runtime Web and WebGPU |
| 1 | +# Local Chatbot in the browser using Phi3, ONNX Runtime Web and WebGPU |
2 | 2 |
|
3 | 3 | This repository contains an example of running [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) in your browser using [ONNX Runtime Web](https://github.com/microsoft/onnxruntime) with WebGPU. |
4 | 4 |
|
5 | 5 | You can try out the live demo [here](https://guschmue.github.io/ort-webgpu/chat/index.html). |
6 | 6 |
|
7 | | -We keep this example simple and use the onnxruntime-web api directly without a |
8 | | -higher level framework like [transformers.js](https://github.com/xenova/transformers.js). |
| 7 | +We keep this example simple and use the onnxruntime-web api directly. ONNX Runtime Web has been powering |
| 8 | +higher level frameworks like [transformers.js](https://github.com/xenova/transformers.js). |
9 | 9 |
|
10 | 10 | ## Getting Started |
11 | 11 |
|
@@ -42,13 +42,11 @@ Point your browser to http://localhost:8080/. |
42 | 42 |
|
43 | 43 | ### The Phi3 ONNX Model |
44 | 44 |
|
45 | | -The model used in this example is hosted on [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx-web). It is slightly different than the ONNX model for CUDA or CPU: |
| 45 | +The model used in this example is hosted on [Hugging Face](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx-web). It is an optimized ONNX version specific to Web and slightly different than the ONNX model for CUDA or CPU: |
46 | 46 | 1. The model output 'logits' is kept as float32 (even for float16 models) since Javascript does not support float16. |
47 | 47 | 2. Our WebGPU implementation uses the custom Multiheaded Attention operator instread of Group Query Attention. |
48 | 48 | 3. Phi3 is larger then 2GB and we need to use external data files. To keep them cacheable in the browser, |
49 | 49 | both model.onnx and model.onnx.data are kept under 2GB. |
50 | 50 |
|
51 | | -The model was created using the [ONNX genai model builder](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models). |
52 | | - |
53 | | -If you like to create the model yourself, you can use [Olive](https://github.com/microsoft/Olive/). |
54 | | -An example how to create the model for ONNX Runtime Web with Olive can be found [here](https://github.com/microsoft/Olive/tree/main/examples/phi3). |
| 51 | +If you like to optimize your fine-tuned pytorch Phi-3-min model, you can use [Olive](https://github.com/microsoft/Olive/) which supports float data type conversion and [ONNX genai model builder toolkit](https://github.com/microsoft/onnxruntime-genai/tree/main/src/python/py/models). |
| 52 | +An example how to optimize Phi-3-min model for ONNX Runtime Web with Olive can be found [here](https://github.com/microsoft/Olive/tree/main/examples/phi3). |
0 commit comments