python-searchengine

Simple search engine implementation in Python for illustrative purposes to go with this blog post.

Requirements

Python 3.10 or greater, and uv.

Usage

Install dependencies:

uv sync

Run the full-text search from the command line. On first run, the Wikipedia dataset (~20GB) will be downloaded from Hugging Face and cached automatically:

uv run python run.py

Run the semantic (vector) search:

uv run python run_semantic.py

On first run this builds a vector index by embedding all 6.4M documents. Embeddings are checkpointed to data/checkpoints/ so you can resume if interrupted. The finished index is saved to data/vector_index.* and memory-mapped on subsequent runs.

To skip the multi-hour encoding step, download the pre-computed embeddings from Hugging Face, place the JSON and .npy files in data/checkpoints/, and run uv run python run_semantic.py.

If you'd like to download the dataset separately (e.g. before a demo):

uv run python download.py

To get higher download rate limits, set a Hugging Face token:

export HF_TOKEN=hf_...

Run from interactive console:

uv run ipython

In [1]: run run.py
In [2]: index.search('python programming language', rank=True)[:5]

Development

Lint and type check:

uv run ruff check .
uv run mypy search/

Run tests:

uv run pytest -v

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
data		data
search		search
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
download.py		download.py
load.py		load.py
pyproject.toml		pyproject.toml
run.py		run.py
run_semantic.py		run_semantic.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

python-searchengine

Requirements

Usage

Development

About

Uh oh!

Releases

Packages

Contributors 4

Languages

License

bartdegoede/python-searchengine

Folders and files

Latest commit

History

Repository files navigation

python-searchengine

Requirements

Usage

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages