LLM Search Quality Evaluation

Overview

Dataset Generator
Vector Search Doctor
- Embedding Model Evaluator
- Approximate Search Evaluator

Dataset Generator

This tool provides a flexible command-line tool to generate relevance datasets for search evaluation. It can retrieve documents from a search engine, generate synthetic queries, and score the relevance of document-query pairs using LLMs.

Vector Search Doctor

This tool helps diagnose and optimize vector search performance by evaluating both embedding models and search configurations. It consists of two sub-tools that work together to identify bottlenecks and improve retrieval quality in your vector search pipeline.

Embedding Model Evaluator

This sub-tool extends MTEB benchmarking tool to test a HuggingFace embedding model performance on both Retrieval and Reranking tasks based on custom datasets.

Approximate Search Evaluator

This sub-tool provides a flexible tool to deply RRE and extract metrics to test your search engine collection given a template.

Quickstart: tools installation

uv: A fast Python package installer and resolver. To install uv follow the instructions here
Python=3.10 version is fixed and widely used in the project, see .python-version file

First, create a virtual environment using uv following the file pyproject.toml. To do so, just execute:

# install dependencies (for users)
uv sync

# install development dependencies as well (e.g., mypy and ruff)
uv sync --group dev

# remove all cached packages
uv cache clean

Running Dataset Generator

Before running the command below, you need to have running search engine instance (solr/opensearch/elasticsearch/vespa).

For a detailed description to fill your configuration file (e.g., Config) you can look at the Dataset Generator README.

Execute the main script via CLI, pointing to your DAGE configuration file:

uv run dataset_generator --config <path-to-config-yaml>

By default, the CLI is pointing to the file inside the examples/configs/ directory.

To know more about all the possible CLI parameters, execute:

uv run dataset_generator --help

Running Embedding Model Evaluator

For a detailed description to fill in configuration file (e.g., Config) you can look at the README.

Execute the main script via CLI, pointing to configuration file:

uv run embedding_model_evaluator --config <path-to-config-yaml>

By default, the CLI is pointing to the file inside the examples/configs/ directory.

Running Approximate Search Evaluator

For a detailed description to fill in configuration file (e.g., Config) you can look at the README.

uv run approximate_search_evaluator --config <path-to-config-yaml>

By default, the CLI is pointing to the file inside the examples/configs/ directory.

Running tests

1. Unit Tests

Execute pytest command as follows:

uv run pytest

The script will then:

Fetch documents from the specified search engine.
Generate or load queries.
Score the relevance for each (document, query) pair.
Save the output to the destination (specified in the config file).

Code Quality Tools

This project uses:

Ruff for linting.
Mypy for static type checking.

Linting with Ruff

# Check for issues
uv run ruff check .

# Auto-fix fixable issues
uv run ruff check --fix .

# Format code (if enabled)
uv run ruff format .

Type Checking with Mypy

# Run type checking
uv run mypy .

Config Files

ruff.toml: Ruff linting rules and settings.
mypy.ini: Mypy type checking rules and settings.

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github/workflows		.github/workflows
examples		examples
src/llm_search_quality_evaluation		src/llm_search_quality_evaluation
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Search Quality Evaluation

Overview

Dataset Generator

Vector Search Doctor

Embedding Model Evaluator

Approximate Search Evaluator

Quickstart: tools installation

Running Dataset Generator

Running Embedding Model Evaluator

Running Approximate Search Evaluator

Running tests

1. Unit Tests

Code Quality Tools

Linting with Ruff

Type Checking with Mypy

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

SeaseLtd/llm-search-quality-evaluation

Folders and files

Latest commit

History

Repository files navigation

LLM Search Quality Evaluation

Overview

Dataset Generator

Vector Search Doctor

Embedding Model Evaluator

Approximate Search Evaluator

Quickstart: tools installation

Running Dataset Generator

Running Embedding Model Evaluator

Running Approximate Search Evaluator

Running tests

1. Unit Tests

Code Quality Tools

Linting with Ruff

Type Checking with Mypy

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages