Skip to content

Commit 04e106f

Browse files
authored
Add gpt-5 (#168)
* Add gpt-5 * Update table
1 parent 3de150a commit 04e106f

File tree

5 files changed

+109
-13
lines changed

5 files changed

+109
-13
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ Deep research has broken out as one of the most popular agent applications. This
88

99
### 🔥 Recent Updates
1010

11+
**August 7, 2025**: Added support for GPT-5 models and updated the Deep Research Bench evaluation to use GPT-5.
12+
1113
**August 2, 2025**: Achieved #6 ranking on the [Deep Research Bench Leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard) with an overall score of 0.4344.
1214

1315
**July 30, 2025**: Read about the evolution from our original implementations to the current version in our [blog post](https://rlancemartin.github.io/2025/07/30/bitter_lesson/).
@@ -103,9 +105,10 @@ This creates `tests/expt_results/deep_research_bench_model-name.jsonl` with the
103105

104106
| Name | Commit | Summarization | Research | Compression | Total Cost | Total Tokens | RACE Score | Experiment |
105107
|------|--------|---------------|----------|-------------|------------|--------------|------------|------------|
108+
| GPT-5 | [168](https://github.com/langchain-ai/open_deep_research/pull/168/commits) | openai:gpt-4.1-mini | openai:gpt-5 | openai:gpt-4.1 | TBD | 204,640,896 | 0.4932 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-613c-4bda-8bde-f64f0422bbf3/compare?selectedSessions=4d5941c8-69ce-4f3d-8b3e-e3c99dfbd4cc&baseline=undefined) |
106109
| Defaults | [6532a41](https://github.com/langchain-ai/open_deep_research/commit/6532a4176a93cc9bb2102b3d825dcefa560c85d9) | openai:gpt-4.1-mini | openai:gpt-4.1 | openai:gpt-4.1 | $45.98 | 58,015,332 | 0.4309 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=cf4355d7-6347-47e2-a774-484f290e79bc&baseline=undefined) |
107110
| Claude Sonnet 4 | [f877ea9](https://github.com/langchain-ai/open_deep_research/pull/163/commits/f877ea93641680879c420ea991e998b47aab9bcc) | openai:gpt-4.1-mini | anthropic:claude-sonnet-4-20250514 | openai:gpt-4.1 | $187.09 | 138,917,050 | 0.4401 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=04f6002d-6080-4759-bcf5-9a52e57449ea&baseline=undefined) |
108-
| Deep Research Bench Submission | [c0a160b](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) | openai:gpt-4.1-nano | openai:gpt-4.1 | openai:gpt-4.1 | $87.83 | 207,005,549 | 0.4344 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=e6647f74-ad2f-4cb9-887e-acb38b5f73c0&baseline=undefined) |
111+
| Deep Research Bench Submission | [c0a160b](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) | openai:gpt-4.1-nano | openai:gpt-4.1 | openai:gpt-4.1 | $87.83 | 207,005,549 | 0.4344 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=e6647f74-ad2f-4cb9-887e-acb38b5f73c0&baseline=undefined) |
109112

110113
### 🚀 Deployments and Usage
111114

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ requires-python = ">=3.10"
1111
dependencies = [
1212
"langgraph>=0.5.4",
1313
"langchain-community>=0.3.9",
14-
"langchain-openai>=0.3.7",
14+
"langchain-openai>=0.3.28",
1515
"langchain-anthropic>=0.3.15",
1616
"langchain-mcp-adapters>=0.1.6",
1717
"langchain-deepseek>=0.1.2",
1818
"langchain-tavily",
1919
"langchain-groq>=0.2.4",
20-
"openai>=1.61.0",
20+
"openai>=1.99.2",
2121
"tavily-python>=0.5.0",
2222
"arxiv>=2.1.3",
2323
"pymupdf>=1.25.3",

tests/expt_results/deep_research_bench_gpt-5.jsonl

Lines changed: 93 additions & 0 deletions
Large diffs are not rendered by default.

tests/run_evaluate.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
max_react_tool_calls = 10
2323
summarization_model = "openai:gpt-4.1-mini"
2424
summarization_model_max_tokens = 8192
25-
research_model = "openai:gpt-4.1" # "anthropic:claude-sonnet-4-20250514"
25+
research_model = "openai:gpt-5" # "anthropic:claude-sonnet-4-20250514"
2626
research_model_max_tokens = 10000
2727
compression_model = "openai:gpt-4.1"
2828
compression_model_max_tokens = 10000
@@ -65,7 +65,7 @@ async def main():
6565
target,
6666
data=dataset_name,
6767
evaluators=evaluators,
68-
experiment_prefix=f"ODR GPT-4.1, Tavily Search, Fix Max Supervisor Iterations",
68+
experiment_prefix=f"ODR GPT-5, Tavily Search",
6969
max_concurrency=10,
7070
metadata={
7171
"max_structured_output_retries": max_structured_output_retries,

uv.lock

Lines changed: 8 additions & 8 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)