API Evaluations

This repository provides a framework for running evaluations, including OpenAI's SimpleQA evaluation. This code was used to evaluate the APIs in this You.com blogpost.

If you would like to reproduce the numbers or add new samplers, follow the instructions on how to install and run the code.

Installation

Clone this repository:

git clone https://github.com/youdotcom-oss/evals.git
cd evals

Install the required dependencies:

pip install -r requirements.txt
pip install -e .

Set up environment variables as environment variables or an .env file:

export OPENAI_API_KEY=your_openai_api_key
export YOU_API_KEY=your_you_api_key
export TAVILY_API_KEY=your_you_api_key
export EXA_API_KEY=your_you_api_key
export SERP_API_KEY=your_you_api_key

Running a SimpleQA evaluation

To run a SimpleQA evaluation, simply run the simpleqa_runner.py file with your desired arguments.

View available arguments and samplers

python src/simpleqa/simpleqa_runner.py --help

Run the SimpleQA evaluation on the entire problem set for all available samplers with default settings

python src/simpleqa/simpleqa_runner.py

Run the SimpleQA evaluation on just You.com for 5 random problems

python src/simpleqa/simpleqa_runner.py --samplers you --limit 5

Interpreting Results

Results files will be placed in simpleqa/results after a successful run of SimpleQA. Files following the pattern raw_results_{sampler}.csv are the raw results for each individual sampler. The file simpleqa_results.csv contains aggregated results with various metrics useful for analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/simpleqa		src/simpleqa
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

API Evaluations

Installation

Running a SimpleQA evaluation

Interpreting Results

About

Uh oh!

Releases

Packages

Languages

youdotcom-oss/evals

Folders and files

Latest commit

History

Repository files navigation

API Evaluations

Installation

Running a SimpleQA evaluation

Interpreting Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages