I wanted an excuse see what all the hype about WebGPU and WebAssembly was all about for a long time. Then I attended a Rust Wasm meetup and was eager to find a project to learn about these technologies.
docVec is a client-side fully working semantic search engine, ie. having the model run ENTIRELY on the client machine. This is NOT a production-ready project.
My goals for the project were to:
- Use Rust for NN inference
- Use the GPU for model inference and see how mature it is to use wgpu: Luckily, I found the amazing project wonnx. I had to hack around some issues of running transformers and also implement some missing ONNX operators (cf. PR) for this to work. Also, I am still working on re-implementing the project's MatMul broadcasting and trying if possible to improve the compute shader performance.
- Implement the whole logic in a webassembly module in Rust. The goal here is to understand some internals of wasm and the limitations that come from that
- Keep the JS to a minimum.
- Don't overcomplicate the search engine. For now a simple index of flat vector suffice.
-
Download
gte-smallmodel from huggingfacecd model/ git clone https://huggingface.co/Supabase/gte-small -
Install onnx simplifier :
onnxsim -
Simplify model and fix input batch size and sequence length
python -m onnxsim gte-small/onnx/model.onnx gte-small/onnx/sim_model.onnx \ --overwrite-input-shape "input_ids:1,512" "attention_mask:1,512" "token_type_ids:1,512"
-
Install
wasm-packcargo install wasm-pack
-
Clone modified version of
wonnx(temporary)cd .. git clone https://github.com/AmineDiro/wonnx.git git checkout broadcast-matmul -
Build web assembly module & serve the page
cd .. # go to project root ./build.sh && python3 -m http.server 8000
Now you can access the semantic search module on http://localhost:8000 🌟
-
Backend (wasm):
- Project scaffolding using
wasm-bindgen - Generate string embedding using
wonnxandgte-smallmodel:- Add
Erfoperator to wonnx - Modify
MatMulbroadcasting checks ( this is temporary) - Reimplement correct
MatMulwith broadcasting - Investigate float NaN issues on Vulkan backend for wgpu
- Add
- Tokenize input in wasm
tokenizers - Build index :
- Split page text
- Embed text using
sentence-transformers - Load index in wasm module
- Implement L2 distance and return k nearest neighbors (avec
Vec<String>)
- Project scaffolding using
-
Frontend:
- Download example wiki page as simple html
- Loop over page elements and search for matching html element
- Highlight just the text and a littlebit the surrounding
