Skip to content

Commit b9a234b

Browse files
minor cleanup
1 parent 47b94f3 commit b9a234b

34 files changed

Lines changed: 65 additions & 61 deletions

project/paperbench/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Module-specific ignores
22
*.pyc
33
run_eval.log
4+
paperbench/solvers/agent.env
45

56
# Ignore everything in runs directory
67
runs/

project/paperbench/README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ source .env
8787

8888
### Agent resources
8989

90-
Some papers require access to the OpenAI API and HuggingFace in order to be fully replicated. To make these API keys available to the agent, first copy `paperbench/agents/agent.env.example` to `paperbench/agents/agent.env` and populate the `OPENAI_API_KEY` and `HF_TOKEN` fields. The `agent.env` is provided in the agent's workspace and the agent is informed of this file in our [default instructions for BasicAgent](paperbench/agents/aisi-basic-agent/templates.py), but `agent.env` is not sourced automatically on behalf of the agent.
90+
Some papers require access to the OpenAI API and HuggingFace in order to be fully replicated. To make these API keys available to the agent, first copy `paperbench/solvers/agent.env.example` to `paperbench/solvers/agent.env` and populate the `OPENAI_API_KEY` and `HF_TOKEN` fields. The `agent.env` is provided in the agent's workspace and the agent is informed of this file in our [default instructions for BasicAgent](paperbench/solvers/aisi-basic-agent/templates.py), but `agent.env` is not sourced automatically on behalf of the agent.
9191

9292
The HuggingFace token can be obtained through the website after making an account, and should have authorization to access [Llama-2](https://huggingface.co/meta-llama/Llama-2-7b) and [ImageNet](https://huggingface.co/datasets/ILSVRC/imagenet-1k).
9393

@@ -115,8 +115,8 @@ PaperBench requires the following Docker images to be built before it can run:
115115

116116
We provide the following agents out-of-the-box:
117117

118-
- [paperbench/agents/dummy/solver.py](paperbench/agents/dummy/solver.py): A agent for that creates a dummy submission, useful for testing the eval end-to-end.
119-
- [paperbench/agents/basicagent/solver.py](paperbench/agents/basicagent/solver.py): Simple ReAct style agents with tools available to them.
118+
- [paperbench/solvers/dummy/solver.py](paperbench/solvers/dummy/solver.py): A agent for that creates a dummy submission, useful for testing the eval end-to-end.
119+
- [paperbench/solvers/basicagent/solver.py](paperbench/solvers/basicagent/solver.py): Simple ReAct style agents with tools available to them.
120120

121121
For convenience, we've provided a [script](paperbench/scripts/build-docker-images.sh) that builds all the above images:
122122

@@ -126,7 +126,7 @@ bash paperbench/scripts/build-docker-images.sh
126126

127127
## Set agent configurations
128128

129-
Configurations for each agent are stored in the [paperbench/agents/](paperbench/agents/) directory. Each agent has a `config.yaml` file which sets the agent's possible configurations, e.g. which model to use, how long to run the agent for, etc.
129+
Configurations for each agent are stored in the [paperbench/solvers/](paperbench/solvers/) directory. Each agent has a `config.yaml` file which sets the agent's possible configurations, e.g. which model to use, how long to run the agent for, etc.
130130

131131
## Quickstart
132132

@@ -136,29 +136,29 @@ To see all configurable options for PaperBench, run:
136136
uv run python -m paperbench.nano.entrypoint --help
137137
```
138138

139-
To use our default solver (`BasicAgentSolver`), you must set `paperbench.solver=paperbench.agents.basicagent.solver:BasicAgentSolver`, specifying to use the `BasicAgentSolver` solver.
139+
To use our default solver (`BasicAgentSolver`), you must set `paperbench.solver=paperbench.solvers.basicagent.solver:BasicAgentSolver`, specifying to use the `BasicAgentSolver` solver.
140140

141141
To test that things are set up correctly, run the dummy agent with the dummy judge (`paperbench.judge.scaffold=dummy`) on the debug split. Note this dummy agent is different to the [default nanoeval "dummy" agent](../common/nanoeval/nanoeval/solvers/computer_tasks/solver.py); our dummy agent tests several aspects of the infrastructure and makes a toy submission.
142142

143143
```bash
144144
uv run python -m paperbench.nano.entrypoint \
145145
paperbench.paper_split=debug \
146-
paperbench.solver=paperbench.agents.dummy.solver:PaperBenchDummySolver \
146+
paperbench.solver=paperbench.solvers.dummy.solver:PaperBenchDummySolver \
147147
paperbench.solver.computer_runtime.env=alcatraz.clusters.local:LocalConfig \
148148
paperbench.solver.computer_runtime.env.pull_from_registry=false \
149149
paperbench.judge.scaffold=dummy \
150150
runner.recorder=nanoeval.json_recorder:json_recorder
151151
```
152152

153-
To run BasicAgent, use `paperbench.solver=paperbench.agents.basicagent.solver:BasicAgentSolver`. By default, this will run for 5 minutes and use `gpt-4.1-mini`.
153+
To run BasicAgent, use `paperbench.solver=paperbench.solvers.basicagent.solver:BasicAgentSolver`. By default, this will run for 5 minutes and use `gpt-4.1-mini`.
154154

155155
Below, we present a command that you can use for development, which runs the
156156
above configuration on on the dev split of PaperBench, with short timeouts and
157157
cheap models.
158158

159159
```bash
160160
uv run python -m paperbench.nano.entrypoint \
161-
paperbench.solver=paperbench.agents.basicagent.solver:BasicAgentSolver \
161+
paperbench.solver=paperbench.solvers.basicagent.solver:BasicAgentSolver \
162162
paperbench.solver.computer_runtime.env=alcatraz.clusters.local:LocalConfig \
163163
paperbench.solver.computer_runtime.env.pull_from_registry=false \
164164
paperbench.paper_split=dev \
@@ -181,8 +181,8 @@ following command:
181181
```bash
182182
uv run python -m paperbench.nano.entrypoint \
183183
paperbench.paper_split=all \
184-
paperbench.solver=paperbench.agents.basicagent.solver:BasicAgentSolver \
185-
paperbench.solver.completer_config=paperbench.agents.basicagent.completer:OpenAIResponsesTurnCompleterConfig \
184+
paperbench.solver=paperbench.solvers.basicagent.solver:BasicAgentSolver \
185+
paperbench.solver.completer_config=paperbench.solvers.basicagent.completer:OpenAIResponsesTurnCompleterConfig \
186186
paperbench.solver.completer_config.model='gpt-5-2025-08-07' \
187187
paperbench.solver.completer_config.tools="[{'type':'web_search'}]" \
188188
paperbench.solver.completer_config.reasoning=preparedness_turn_completer.oai_responses_turn_completer.completer:ReasoningConfig \
@@ -229,7 +229,7 @@ Then run the paperbench like above but with appropriately configured `PBDirectSu
229229
```bash
230230
uv run python -m paperbench.nano.entrypoint \
231231
paperbench.paper_split=debug \
232-
paperbench.solver=paperbench.agents.direct_submission.solver:PBDirectSubmissionSolver \
232+
paperbench.solver=paperbench.solvers.direct_submission.solver:PBDirectSubmissionSolver \
233233
paperbench.solver.submissions_dir=path/to/foo/ \
234234
paperbench.solver.computer_runtime.env=alcatraz.clusters.local:LocalConfig \
235235
paperbench.solver.computer_runtime.env.pull_from_registry=false \

project/paperbench/paperbench/Dockerfile.base

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ RUN mkdir -p ${LOGS_DIR} ${AGENT_DIR} ${SUBMISSION_DIR}
1515
RUN mkdir -p ${WORKSPACE_BASE}/paper
1616

1717
RUN mkdir ${WORKSPACE_BASE}/.vscode
18-
COPY paperbench/agents/launch.json ${WORKSPACE_BASE}/.vscode/launch.json
18+
COPY paperbench/solvers/launch.json ${WORKSPACE_BASE}/.vscode/launch.json
1919

2020
# Avoid interactive dialog from apt-get and other packages requiring configuration
2121
ENV DEBIAN_FRONTEND=noninteractive
@@ -73,15 +73,15 @@ RUN cd ${SUBMISSION_DIR} && \
7373
git init && \
7474
mkdir -p .git/hooks
7575

76-
COPY paperbench/agents/pre-commit ${SUBMISSION_DIR}/.git/hooks/
76+
COPY paperbench/solvers/pre-commit ${SUBMISSION_DIR}/.git/hooks/
7777
RUN chmod +x ${SUBMISSION_DIR}/.git/hooks/pre-commit
7878

7979
# Set up git config
8080
RUN git config --global user.email "agent@example.com" && \
8181
git config --global user.name "agent"
8282

8383
# Set up apply_patch
84-
COPY paperbench/agents/apply_patch.py ${AGENT_DIR}/apply_patch.py
84+
COPY paperbench/solvers/apply_patch.py ${AGENT_DIR}/apply_patch.py
8585
RUN echo "#!/bin/bash\npython ${AGENT_DIR}/apply_patch.py '$@'" > /bin/apply_patch
8686
RUN chmod +x /bin/apply_patch
8787

project/paperbench/paperbench/nano/task.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,6 @@
2525
RuntimeConfig,
2626
)
2727
from nanoeval.solvers.computer_tasks.task import ComputerTask
28-
from paperbench.agents.upload import (
29-
upload_heavy_logs,
30-
upload_status,
31-
)
3228
from paperbench.computer_utils import maybe_release_computer, start_computer_with_retry
3329
from paperbench.constants import (
3430
AGENT_DIR_CONFIG,
@@ -46,7 +42,11 @@
4642
)
4743
from paperbench.nano.utils import get_file_at_duration
4844
from paperbench.paper_registry import paper_registry
49-
from paperbench.scripts.run_reproduce import reproduce_on_computer_with_salvaging
45+
from paperbench.reproduce import reproduce_on_computer_with_salvaging
46+
from paperbench.solvers.upload import (
47+
upload_heavy_logs,
48+
upload_status,
49+
)
5050
from paperbench.utils import get_agents_dir, purple
5151

5252
GRADER_OPENAI_API_KEY = os.getenv("GRADER_OPENAI_API_KEY") or os.getenv("OPENAI_API_KEY")
File renamed without changes.
File renamed without changes.

project/paperbench/paperbench/agents/agent.env.example renamed to project/paperbench/paperbench/solvers/agent.env.example

File renamed without changes.
File renamed without changes.
File renamed without changes.

project/paperbench/paperbench/agents/basicagent/api.py renamed to project/paperbench/paperbench/solvers/basicagent/api.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@
1919
is_chat_completion_message_param,
2020
)
2121

22-
from paperbench.agents.basicagent.completer import TimeTrackingRetryConfig
23-
from paperbench.agents.basicagent.tools.base import ToolCall
22+
from paperbench.solvers.basicagent.completer import TimeTrackingRetryConfig
23+
from paperbench.solvers.basicagent.tools.base import ToolCall
2424

2525
logger = structlog.stdlib.get_logger(component=__name__)
2626

0 commit comments

Comments
 (0)