Static embedding search

Comprehensive solution for providing context-aware static file search. By default, built for static deployment of LaTeX files with open sources, but can be adapted for any static file type.

Usage

Pick a specific embedding model

python3 save_model.py --model <your_model_name>

Generate the embeddings for your static files

python3 generate.py ../notes/*/*.tex --strip_paths 3

Here the --strip_paths 3 option strips the first 2 path components from the file name. The saved file links will be automatically adjusted to point to the PDF files, however, manually may be changed using the --path_suffix .pdf option.

Serve the embeddings

npm run dev
npm run build

Alternatively you can just run ./build.sh to build the embeddings and the frontend into a single output directory dist/.

./build.sh ../notes/*/*.tex --strip_paths 3

GitHub Actions

This repository includes a GitHub Actions workflow component, which can be used to automate the embedding and deployment process.

name: Test Static Embed DB Action

jobs:
  build-embed-db:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: TCA166/static-embed-db@main # or @v1 after tagging
        with:
          glob: '../*/*.tex'
          strip_paths: 1
          dist_dir: .

The above, will generate the embeddings based on .tex files in subdirectories of the current directory and build the frontend into dist/ in the current directory.

Architecture

First, in Python, a given embedding model is loaded and converted to ONNX format. After that, the provided files are loaded, preprocessed using lexers provided by pygments, and converted into embeddings using the ONNX model. With the embeddings generated, the frontend is built, under the following assumptions:

The model is available under /model/
The embedding DB is available as /embeddings.json
The indexed files in the DB are available, under the paths provided to generate.py. Here; the --strip_paths option may come in handy to adjust the file paths in the DB.

Feel free to reference my use-case, deployed at GitHub Pages.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src		src
static		static
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
build.sh		build.sh
eslint.config.js		eslint.config.js
generate.py		generate.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
save_model.py		save_model.py
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Static embedding search

Usage

GitHub Actions

Architecture

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Static embedding search

Usage

GitHub Actions

Architecture

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages