Skip to content

Commit 6532a41

Browse files
Ruff, update prompts, add think tool, update defaults (#163)
* Ruff, update prompts, add think tool, update defaults * Update with results from deep research bench * Update README * Update README.md * Update think_tool * Fix w/ updated evals * Update README.md * Set default * Fix prompt * Update tool processing for parallel tool calls * Fix exit condition * Update defaults * Minor changes * Update default supervisor max iter for think_tool * Expt * Update researcher prompt limits * Update config --------- Co-authored-by: nhuang-lc <[email protected]>
1 parent f95d200 commit 6532a41

File tree

13 files changed

+1602
-428
lines changed

13 files changed

+1602
-428
lines changed

CLAUDE.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Open Deep Research Repository Overview
2+
3+
## Project Description
4+
Open Deep Research is a configurable, fully open-source deep research agent that works across multiple model providers, search tools, and MCP (Model Context Protocol) servers. It enables automated research with parallel processing and comprehensive report generation.
5+
6+
## Repository Structure
7+
8+
### Root Directory
9+
- `README.md` - Comprehensive project documentation with quickstart guide
10+
- `pyproject.toml` - Python project configuration and dependencies
11+
- `langgraph.json` - LangGraph configuration defining the main graph entry point
12+
- `uv.lock` - UV package manager lock file
13+
- `LICENSE` - MIT license
14+
- `.env.example` - Environment variables template (not tracked)
15+
16+
### Core Implementation (`src/open_deep_research/`)
17+
- `deep_researcher.py` - Main LangGraph implementation (entry point: `deep_researcher`)
18+
- `configuration.py` - Configuration management and settings
19+
- `state.py` - Graph state definitions and data structures
20+
- `prompts.py` - System prompts and prompt templates
21+
- `utils.py` - Utility functions and helpers
22+
- `files/` - Research output and example files
23+
24+
### Legacy Implementations (`src/legacy/`)
25+
Contains two earlier research implementations:
26+
- `graph.py` - Plan-and-execute workflow with human-in-the-loop
27+
- `multi_agent.py` - Supervisor-researcher multi-agent architecture
28+
- `legacy.md` - Documentation for legacy implementations
29+
- `CLAUDE.md` - Legacy-specific Claude instructions
30+
- `tests/` - Legacy-specific tests
31+
32+
### Security (`src/security/`)
33+
- `auth.py` - Authentication handler for LangGraph deployment
34+
35+
### Testing (`tests/`)
36+
- `run_evaluate.py` - Main evaluation script configured to run on deep research bench
37+
- `evaluators.py` - Specialized evaluation functions
38+
- `prompts.py` - Evaluation prompts and criteria
39+
- `pairwise_evaluation.py` - Comparative evaluation tools
40+
- `supervisor_parallel_evaluation.py` - Multi-threaded evaluation
41+
42+
### Examples (`examples/`)
43+
- `arxiv.md` - ArXiv research example
44+
- `pubmed.md` - PubMed research example
45+
- `inference-market.md` - Inference market analysis examples
46+
47+
## Key Technologies
48+
- **LangGraph** - Workflow orchestration and graph execution
49+
- **LangChain** - LLM integration and tool calling
50+
- **Multiple LLM Providers** - OpenAI, Anthropic, Google, Groq, DeepSeek support
51+
- **Search APIs** - Tavily, OpenAI/Anthropic native search, DuckDuckGo, Exa
52+
- **MCP Servers** - Model Context Protocol for extended capabilities
53+
54+
## Development Commands
55+
- `uvx langgraph dev` - Start development server with LangGraph Studio
56+
- `python tests/run_evaluate.py` - Run comprehensive evaluations
57+
- `ruff check` - Code linting
58+
- `mypy` - Type checking
59+
60+
## Configuration
61+
All settings configurable via:
62+
- Environment variables (`.env` file)
63+
- Web UI in LangGraph Studio
64+
- Direct configuration modification
65+
66+
Key settings include model selection, search API choice, concurrency limits, and MCP server configurations.

README.md

Lines changed: 45 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Open Deep Research
1+
# 🔬 Open Deep Research
22

33
<img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />
44

@@ -7,6 +7,10 @@ Deep research has broken out as one of the most popular agent applications. This
77
* Read more in our [blog](https://blog.langchain.com/open-deep-research/)
88
* See our [video](https://www.youtube.com/watch?v=agGiWUpxkhg) for a quick overview
99

10+
### 🔥 Recent Updates
11+
12+
**August 2, 2025**: Achieved #6 ranking on the [Deep Research Bench Leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard) with an overall score of 0.4344.
13+
1014
### 🚀 Quickstart
1115

1216
1. Clone the repository and activate a virtual environment:
@@ -19,6 +23,8 @@ source .venv/bin/activate # On Windows: .venv\Scripts\activate
1923

2024
2. Install dependencies:
2125
```bash
26+
uv sync
27+
# or
2228
uv pip install -r pyproject.toml
2329
```
2430

@@ -44,9 +50,9 @@ Use this to open the Studio UI:
4450

4551
Ask a question in the `messages` input field and click `Submit`.
4652

47-
### Configurations
53+
### ⚙️ Configurations
4854

49-
Open Deep Research offers extensive configuration options to customize the research process and model behavior. All configurations can be set via the web UI, environment variables, or by modifying the configuration directly.
55+
Extensive configuration options to customize research behavior. Configure via web UI, environment variables, or direct modification.
5056

5157
#### General Settings
5258

@@ -64,9 +70,9 @@ Open Deep Research offers extensive configuration options to customize the resea
6470

6571
Open Deep Research uses multiple specialized models for different research tasks:
6672

67-
- **Summarization Model** (default: `openai:gpt-4.1-nano`): Summarizes research results from search APIs
73+
- **Summarization Model** (default: `openai:gpt-4.1-mini`): Summarizes research results from search APIs
6874
- **Research Model** (default: `openai:gpt-4.1`): Conducts research and analysis
69-
- **Compression Model** (default: `openai:gpt-4.1-mini`): Compresses research findings from sub-agents
75+
- **Compression Model** (default: `openai:gpt-4.1`): Compresses research findings from sub-agents
7076
- **Final Report Model** (default: `openai:gpt-4.1`): Writes the final comprehensive report
7177

7278
All models are configured using [init_chat_model() API](https://python.langchain.com/docs/how_to/chat_models_universal_init/) which supports providers like OpenAI, Anthropic, Google Vertex AI, and others.
@@ -117,9 +123,9 @@ mcp-server-filesystem /path/to/allowed/dir1 /path/to/allowed/dir2
117123

118124
Remote servers can be configured as authenticated or unauthenticated and support JWT-based authentication through OAuth endpoints.
119125

120-
### Evaluation
126+
### 📊 Evaluation
121127

122-
A comprehensive batch evaluation system designed for detailed analysis and comparative studies.
128+
Comprehensive batch evaluation system for detailed analysis and comparative studies.
123129

124130
#### **Features:**
125131
- **Multi-dimensional Scoring**: Specialized evaluators with 0-1 scale ratings
@@ -130,12 +136,37 @@ A comprehensive batch evaluation system designed for detailed analysis and compa
130136
# Run comprehensive evaluation on LangSmith datasets
131137
python tests/run_evaluate.py
132138
```
133-
#### **Key Files:**
134-
- `tests/run_evaluate.py`: Main evaluation script
135-
- `tests/evaluators.py`: Specialized evaluator functions
136-
- `tests/prompts.py`: Evaluation prompts for each dimension
137139

138-
### Deployments and Usages
140+
#### **Deep Research Bench Submission:**
141+
The evaluation runs against the [Deep Research Bench](https://github.com/Ayanami0730/deep_research_bench), a comprehensive benchmark with 100 PhD-level research tasks across 22 fields.
142+
143+
To submit results to the benchmark:
144+
145+
1. **Run Evaluation**: Execute `python tests/run_evaluate.py` to evaluate against the Deep Research Bench dataset
146+
2. **Extract Results**: Use the extraction script to generate JSONL output:
147+
```bash
148+
python tests/extract_langsmith_data.py --project-name "YOUR_PROJECT_NAME" --model-name "gpt-4.1" --dataset-name "deep_research_bench"
149+
```
150+
This creates `tests/expt_results/deep_research_bench_gpt-4.1.jsonl` with the required format.
151+
3. **Submit to Benchmark**: Move the generated JSONL file to the Deep Research Bench repository and follow their [Quick Start guide](https://github.com/Ayanami0730/deep_research_bench?tab=readme-ov-file#quick-start) for evaluation submission
152+
153+
> **Note:** We submitted results from [this commit](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) to the Deep Research Bench, resulting in an overall score of 0.4344 (#6 on the leaderboard).
154+
155+
Results for current `main` branch utilize more constrained prompting to reduce token spend ~4x while still achieving a score of 0.4268.
156+
157+
#### **Current Results (Main Branch)**
158+
159+
| Metric | Score |
160+
|--------|-------|
161+
| Comprehensiveness | 0.4145 |
162+
| Insight | 0.3854 |
163+
| Instruction Following | 0.4780 |
164+
| Readability | 0.4495 |
165+
| **Overall Score** | **0.4268** |
166+
167+
### 🚀 Deployments and Usage
168+
169+
Multiple deployment options for different use cases.
139170

140171
#### LangGraph Studio
141172

@@ -155,10 +186,10 @@ You can also deploy your own instance of OAP, and make your own custom agents (l
155186
1. [Deploy Open Agent Platform](https://docs.oap.langchain.com/quickstart)
156187
2. [Add Deep Researcher to OAP](https://docs.oap.langchain.com/setup/agents)
157188

158-
### Updates 🔥
159-
160189
### Legacy Implementations 🏛️
161190

191+
Read about the evolution from our original implementations to the current version in our [blog post](https://rlancemartin.github.io/2025/07/30/bitter_lesson/).
192+
162193
The `src/legacy/` folder contains two earlier implementations that provide alternative approaches to automated research:
163194

164195
#### 1. Workflow Implementation (`legacy/graph.py`)
@@ -172,5 +203,3 @@ The `src/legacy/` folder contains two earlier implementations that provide alter
172203
- **Parallel Processing**: Multiple researchers work simultaneously
173204
- **Speed Optimized**: Faster report generation through concurrency
174205
- **MCP Support**: Extensive Model Context Protocol integration
175-
176-
See `src/legacy/legacy.md` for detailed documentation, configuration options, and usage examples for both legacy implementations.

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ readme = "README.md"
99
license = { text = "MIT" }
1010
requires-python = ">=3.10"
1111
dependencies = [
12-
"langgraph>=0.5.3",
12+
"langgraph>=0.5.4",
1313
"langchain-community>=0.3.9",
1414
"langchain-openai>=0.3.7",
1515
"langchain-anthropic>=0.3.15",
@@ -42,6 +42,7 @@ dependencies = [
4242
"ipykernel>=6.29.5",
4343
"supabase>=2.15.3",
4444
"mcp>=1.9.4",
45+
"pandas>=2.3.1",
4546
]
4647

4748
[project.optional-dependencies]

src/legacy/configuration.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ class Configuration:
3636
search_api: SearchAPI = SearchAPI.TAVILY
3737
search_api_config: Optional[Dict[str, Any]] = None
3838
process_search_results: Literal["summarize", "split_and_rerank"] | None = None
39-
summarization_model_provider: str = "anthropic"
40-
summarization_model: str = "claude-3-5-haiku-latest"
39+
summarization_model_provider: str = "openai"
40+
summarization_model: str = "gpt-4.1"
4141
max_structured_output_retries: int = 3
4242
include_source_str: bool = False
4343

@@ -47,8 +47,8 @@ class Configuration:
4747
planner_provider: str = "anthropic"
4848
planner_model: str = "claude-3-7-sonnet-latest"
4949
planner_model_kwargs: Optional[Dict[str, Any]] = None
50-
writer_provider: str = "anthropic"
51-
writer_model: str = "claude-3-7-sonnet-latest"
50+
writer_provider: str = "openai"
51+
writer_model: str = "gpt-4.1"
5252
writer_model_kwargs: Optional[Dict[str, Any]] = None
5353

5454
@classmethod
@@ -73,14 +73,14 @@ class MultiAgentConfiguration:
7373
search_api: SearchAPI = SearchAPI.TAVILY
7474
search_api_config: Optional[Dict[str, Any]] = None
7575
process_search_results: Literal["summarize", "split_and_rerank"] | None = None
76-
summarization_model_provider: str = "anthropic"
77-
summarization_model: str = "claude-3-5-haiku-latest"
76+
summarization_model_provider: str = "openai"
77+
summarization_model: str = "gpt-4.1"
7878
include_source_str: bool = False
7979

8080
# Multi-agent specific configuration
8181
number_of_queries: int = 2 # Number of search queries to generate per section
82-
supervisor_model: str = "anthropic:claude-3-7-sonnet-latest"
83-
researcher_model: str = "anthropic:claude-3-7-sonnet-latest"
82+
supervisor_model: str = "anthropic:claude-sonnet-4-20250514"
83+
researcher_model: str = "anthropic:claude-sonnet-4-20250514"
8484
ask_for_clarification: bool = False # Whether to ask for clarification from the user
8585
# MCP server configuration
8686
mcp_server_config: Optional[Dict[str, Any]] = None

src/open_deep_research/configuration.py

Lines changed: 35 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,24 @@
1-
from pydantic import BaseModel, Field
2-
from typing import Any, List, Optional
3-
from langchain_core.runnables import RunnableConfig
1+
"""Configuration management for the Open Deep Research system."""
2+
43
import os
54
from enum import Enum
5+
from typing import Any, List, Optional
6+
7+
from langchain_core.runnables import RunnableConfig
8+
from pydantic import BaseModel, Field
9+
610

711
class SearchAPI(Enum):
12+
"""Enumeration of available search API providers."""
13+
814
ANTHROPIC = "anthropic"
915
OPENAI = "openai"
1016
TAVILY = "tavily"
1117
NONE = "none"
1218

1319
class MCPConfig(BaseModel):
20+
"""Configuration for Model Context Protocol (MCP) servers."""
21+
1422
url: Optional[str] = Field(
1523
default=None,
1624
optional=True,
@@ -28,6 +36,8 @@ class MCPConfig(BaseModel):
2836
"""Whether the MCP server requires authentication"""
2937

3038
class Configuration(BaseModel):
39+
"""Main configuration class for the Deep Research agent."""
40+
3141
# General Configuration
3242
max_structured_output_retries: int = Field(
3343
default=3,
@@ -82,11 +92,11 @@ class Configuration(BaseModel):
8292
}
8393
)
8494
max_researcher_iterations: int = Field(
85-
default=3,
95+
default=6,
8696
metadata={
8797
"x_oap_ui_config": {
8898
"type": "slider",
89-
"default": 3,
99+
"default": 6,
90100
"min": 1,
91101
"max": 10,
92102
"step": 1,
@@ -95,11 +105,11 @@ class Configuration(BaseModel):
95105
}
96106
)
97107
max_react_tool_calls: int = Field(
98-
default=5,
108+
default=10,
99109
metadata={
100110
"x_oap_ui_config": {
101111
"type": "slider",
102-
"default": 5,
112+
"default": 10,
103113
"min": 1,
104114
"max": 30,
105115
"step": 1,
@@ -109,11 +119,11 @@ class Configuration(BaseModel):
109119
)
110120
# Model Configuration
111121
summarization_model: str = Field(
112-
default="openai:gpt-4.1-nano",
122+
default="openai:gpt-4.1-mini",
113123
metadata={
114124
"x_oap_ui_config": {
115125
"type": "text",
116-
"default": "openai:gpt-4.1-nano",
126+
"default": "openai:gpt-4.1-mini",
117127
"description": "Model for summarizing research results from Tavily search results"
118128
}
119129
}
@@ -128,6 +138,18 @@ class Configuration(BaseModel):
128138
}
129139
}
130140
)
141+
max_content_length: int = Field(
142+
default=50000,
143+
metadata={
144+
"x_oap_ui_config": {
145+
"type": "number",
146+
"default": 50000,
147+
"min": 1000,
148+
"max": 200000,
149+
"description": "Maximum character length for webpage content before summarization"
150+
}
151+
}
152+
)
131153
research_model: str = Field(
132154
default="openai:gpt-4.1",
133155
metadata={
@@ -149,11 +171,11 @@ class Configuration(BaseModel):
149171
}
150172
)
151173
compression_model: str = Field(
152-
default="openai:gpt-4.1-mini",
174+
default="openai:gpt-4.1",
153175
metadata={
154176
"x_oap_ui_config": {
155177
"type": "text",
156-
"default": "openai:gpt-4.1-mini",
178+
"default": "openai:gpt-4.1",
157179
"description": "Model for compressing research findings from sub-agents. NOTE: Make sure your Compression Model supports the selected search API."
158180
}
159181
}
@@ -225,4 +247,6 @@ def from_runnable_config(
225247
return cls(**{k: v for k, v in values.items() if v is not None})
226248

227249
class Config:
250+
"""Pydantic configuration."""
251+
228252
arbitrary_types_allowed = True

0 commit comments

Comments
 (0)