Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
293 changes: 197 additions & 96 deletions README.md

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions apps/prepare-benchmark/.env.template
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# use absolute path
DATA_DIR="../../data" # relative to this file
# For downloading datasets from Hugging Face
HF_TOKEN="<your-huggingface-token>"

# [Optional] Data loading directory, by default `../../data`
DATA_DIR="../../data" # relative to this file
32 changes: 18 additions & 14 deletions apps/run-agent/.env.template
Original file line number Diff line number Diff line change
@@ -1,27 +1,31 @@
# third party API keys
SERPER_API_KEY=""
JINA_API_KEY=""
E2B_API_KEY=""

# Google
GEMINI_API_KEY=""
# Using OpenRouter to provide primary agent model
OPENROUTER_API_KEY=""
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"

# Anthropic
# Anthropic, for vision tools
ANTHROPIC_API_KEY=""
ANTHROPIC_BASE_URL="https://api.anthropic.com"

# openAI
# OpenAI, for audio tools, intent recognition, and answer extraction
OPENAI_API_KEY=""
OPENAI_BASE_URL="https://api.openai.com/v1"

# openrouter
OPENROUTER_API_KEY=""
OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
# Gemini, for YouTube tasks
GEMINI_API_KEY=""

# Third party API keys
# For Google search and website scraping
SERPER_API_KEY=""
# For website scraping
JINA_API_KEY=""
# For the Linux sandbox
E2B_API_KEY=""

# NewAPI
# [Optional] NewAPI, alternative to OpenRouter
NEWAPI_API_KEY=""
NEWAPI_BASE_URL=""

# use HTTPS proxy
# [Optional] for network proxy, null by default
HTTPS_PROXY=""
# [Optional] Data loading directory, by default `../../data`
DATA_DIR="../../data"
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ BENCHMARK_NAME="gaia-validation"
LLM_PROVIDER="claude_openrouter"
LLM_MODEL="anthropic/claude-3.7-sonnet"
AGENT_SET="miroflow"
MAX_CONCURRENT=5

RESULTS_DIR="logs/${BENCHMARK_NAME}/${LLM_PROVIDER}_${LLM_MODEL}_${AGENT_SET}"

Expand All @@ -32,7 +33,7 @@ for i in $(seq 1 $NUM_RUNS); do
llm.model_name=$LLM_MODEL \
llm.async_client=true \
benchmark.execution.max_tasks=null \
benchmark.execution.max_concurrent=5 \
benchmark.execution.max_concurrent=$MAX_CONCURRENT \
benchmark.execution.pass_at_k=1 \
agent=$AGENT_SET \
output_dir="$RESULTS_DIR/$RUN_ID" \
Expand Down
Binary file added docs/figs/core_component_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figs/execution_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.