Summary

I have successfully fixed issue #301 by implementing support for the LLM_API_KEY environment variable to override the API key in LLM configuration files. This allows cloud environments to inject the API key via secrets (e.g., secrets.LLM_API_KEY_EVAL) without modifying the config files.

Changes Made

New utility function (benchmarks/utils/llm_config.py):
- Created load_llm_config() function that loads LLM config from JSON and overrides api_key with LLM_API_KEY environment variable if set
Updated all benchmark run_infer.py files to use the new utility:
- benchmarks/swebench/run_infer.py
- benchmarks/swtbench/run_infer.py
- benchmarks/gaia/run_infer.py
- benchmarks/commit0/run_infer.py
- benchmarks/multiswebench/run_infer.py
- benchmarks/swebenchmultimodal/run_infer.py
- benchmarks/openagentsafety/run_infer.py
Updated validate_cfg.py to use the new utility for consistency
Added comprehensive tests (tests/test_llm_config.py) covering:
- Loading config from file without env var override
- Env var override functionality
- Empty string env var handling
- Error handling for missing config file
- Loading config without api_key in file with env var set

Checklist

✅ Issue Configure llm key for running benchmarks #301 requirements addressed - benchmarks can now use secrets.LLM_API_KEY_EVAL via the LLM_API_KEY environment variable
✅ All pre-commit checks pass (ruff format, ruff lint, pycodestyle, pyright)
✅ All tests pass
✅ Changes are concise and focused on the issue

Pull Request

PR #302: #302

View full conversation

Configure llm key for running benchmarks #301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions