Bench Lab

⚠️ Early Development Notice

This repository is a work in progress. Things may be incomplete, unstable, or subject to change.

Bench Lab is a framework for evaluating large language models (LLMs), agents, and RAG systems across various benchmarks. The project provides a unified interface for benchmarking while offering statistical tools to analyze and improve system performance.

Usage Example

Simple example of the API.

import random

from benchlab.library.math_qa._benchmark import MathQABench


def mock_model(instance, s: str) -> str:
    random_answer = random.randint(1, 10)
    return f"The answer for question {instance.id} is {random_answer}. Or {s}"


def main():
    # init the benchmark
    benchmark = MathQABench(n_instance=5)
    # run your implementation on the benchmark
    execution = benchmark.run(mock_model, kwargs={"s": "I don't know"})
    evaluation = execution.evaluate()
    # finally, aggregate the results and print the benchmark report
    report = evaluation.report()
    report.summary()


if __name__ == "__main__":
    main()

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.idea		.idea
benchlab		benchlab
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bench Lab

Usage Example

About

Uh oh!

Releases

Packages

Languages

VascoSch92/bench-lab

Folders and files

Latest commit

History

Repository files navigation

Bench Lab

Usage Example

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages