This repository provides a framework for running evaluations, including OpenAI's SimpleQA evaluation. This code was used to evaluate the APIs in this You.com blogpost.
If you would like to reproduce the numbers or add new samplers, follow the instructions on how to install and run the code.
-
Clone this repository:
git clone https://github.com/youdotcom-oss/evals.git cd evals
-
Install the required dependencies:
pip install -r requirements.txt pip install -e .
-
Set up environment variables as environment variables or an .env file:
export OPENAI_API_KEY=your_openai_api_key export YOU_API_KEY=your_you_api_key export TAVILY_API_KEY=your_you_api_key export EXA_API_KEY=your_you_api_key export SERP_API_KEY=your_you_api_key
To run a SimpleQA evaluation, simply run the simpleqa_runner.py
file with your desired arguments.
View available arguments and samplers
python src/simpleqa/simpleqa_runner.py --help
Run the SimpleQA evaluation on the entire problem set for all available samplers with default settings
python src/simpleqa/simpleqa_runner.py
Run the SimpleQA evaluation on just You.com for 5 random problems
python src/simpleqa/simpleqa_runner.py --samplers you --limit 5
Results files will be placed in simpleqa/results
after a successful run of SimpleQA. Files following the pattern
raw_results_{sampler}.csv
are the raw results for each individual sampler. The file simpleqa_results.csv
contains
aggregated results with various metrics useful for analysis.