Skip to content

youdotcom-oss/evals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

API Evaluations

This repository provides a framework for running evaluations, including OpenAI's SimpleQA evaluation. This code was used to evaluate the APIs in this You.com blogpost.

If you would like to reproduce the numbers or add new samplers, follow the instructions on how to install and run the code.

Installation

  1. Clone this repository:

    git clone https://github.com/youdotcom-oss/evals.git
    cd evals
  2. Install the required dependencies:

    pip install -r requirements.txt
    pip install -e .
  3. Set up environment variables as environment variables or an .env file:

    export OPENAI_API_KEY=your_openai_api_key
    export YOU_API_KEY=your_you_api_key
    export TAVILY_API_KEY=your_you_api_key
    export EXA_API_KEY=your_you_api_key
    export SERP_API_KEY=your_you_api_key

Running a SimpleQA evaluation

To run a SimpleQA evaluation, simply run the simpleqa_runner.py file with your desired arguments.

View available arguments and samplers

python src/simpleqa/simpleqa_runner.py --help

Run the SimpleQA evaluation on the entire problem set for all available samplers with default settings

python src/simpleqa/simpleqa_runner.py

Run the SimpleQA evaluation on just You.com for 5 random problems

python src/simpleqa/simpleqa_runner.py --samplers you --limit 5

Interpreting Results

Results files will be placed in simpleqa/results after a successful run of SimpleQA. Files following the pattern raw_results_{sampler}.csv are the raw results for each individual sampler. The file simpleqa_results.csv contains aggregated results with various metrics useful for analysis.

About

A repository for running evaluations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages