Name	Name	Last commit message	Last commit date
parent directory ..
agent_framework_lab_gaia	agent_framework_lab_gaia
samples	samples
tests	tests
README.md	README.md

Name

Last commit message

Last commit date

Agent Framework Lab - GAIA

The GAIA benchmark can be used for evaluating agents and workflows built using the Agent Framework. It includes built-in benchmarks as well as utilities for running custom evaluations.

Note: This module is part of the consolidated agent-framework-lab package. Install the package with the gaia extra to use this module.

Setup

Install the agent-framework-lab package with GAIA dependencies:

pip install "agent-framework-lab[gaia]"

Set up Hugging Face token:

export HF_TOKEN="hf\*..." # must have access to gaia-benchmark/GAIA

Create an evaluation script

Create a Python script (e.g., run_gaia.py) with the following content:

from agent_framework.lab.gaia import GAIA, Task, Prediction, GAIATelemetryConfig

async def run_task(task: Task) -> Prediction:
    return Prediction(prediction="answer here", messages=[])

async def main() -> None:
    # Optional: Enable telemetry for detailed tracing
    telemetry_config = GAIATelemetryConfig(
        enable_tracing=True,
        trace_to_file=True,
        file_path="gaia_traces.jsonl"
    )

    runner = GAIA(telemetry_config=telemetry_config)
    await runner.run(run_task, level=1, max_n=5, parallel=2)

See the gaia_sample.py for more detail.

View results

We provide a console viewer for reading GAIA results:

uv run gaia_viewer "gaia_results_<timestamp>.jsonl" --detailed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Agent Framework Lab - GAIA

Setup

Create an evaluation script

View results

FilesExpand file tree

gaia

Directory actions

More options

Directory actions

More options

Latest commit

History

gaia

Folders and files

parent directory

README.md

Agent Framework Lab - GAIA

Setup

Create an evaluation script

View results