DocVec - Wasm meets Semantic search

I wanted an excuse see what all the hype about WebGPU and WebAssembly was all about for a long time. Then I attended a Rust Wasm meetup and was eager to find a project to learn about these technologies.

docVec is a client-side fully working semantic search engine, ie. having the model run ENTIRELY on the client machine. This is NOT a production-ready project.

My goals for the project were to:

Use Rust for NN inference
Use the GPU for model inference and see how mature it is to use wgpu: Luckily, I found the amazing project wonnx. I had to hack around some issues of running transformers and also implement some missing ONNX operators (cf. PR) for this to work. Also, I am still working on re-implementing the project's MatMul broadcasting and trying if possible to improve the compute shader performance.
Implement the whole logic in a webassembly module in Rust. The goal here is to understand some internals of wasm and the limitations that come from that
Keep the JS to a minimum.
Don't overcomplicate the search engine. For now a simple index of flat vector suffice.

Maintainer

Download gte-small model from huggingface

cd model/
git clone https://huggingface.co/Supabase/gte-small

Install onnx simplifier : onnxsim

Simplify model and fix input batch size and sequence length

python -m onnxsim gte-small/onnx/model.onnx  gte-small/onnx/sim_model.onnx \
 --overwrite-input-shape "input_ids:1,512" "attention_mask:1,512" "token_type_ids:1,512"

Install wasm-pack
```
cargo install wasm-pack
```

Clone modified version of wonnx (temporary)

cd ..
git clone https://github.com/AmineDiro/wonnx.git
git checkout broadcast-matmul

Build web assembly module & serve the page

cd ..  # go to project root
./build.sh && python3 -m http.server 8000

Now you can access the semantic search module on http://localhost:8000 🌟

TODO:

Backend (wasm):
- Project scaffolding using wasm-bindgen
- Generate string embedding using wonnx and gte-small model:
  - Add Erf operator to wonnx
  - Modify MatMul broadcasting checks ( this is temporary)
  - Reimplement correct MatMul with broadcasting
  - Investigate float NaN issues on Vulkan backend for wgpu
- Tokenize input in wasm tokenizers
- Build index :
  - Split page text
  - Embed text using sentence-transformers
  - Load index in wasm module
- Implement L2 distance and return k nearest neighbors (avec Vec<String>)
Frontend:
- Download example wiki page as simple html
- Loop over page elements and search for matching html element
- Highlight just the text and a littlebit the surrounding

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
model		model
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
build.sh		build.sh
index.html		index.html
index.js		index.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocVec - Wasm meets Semantic search

Maintainer

TODO:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocVec - Wasm meets Semantic search

Maintainer

TODO:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages