Skip to content

Commit 3de150a

Browse files
committed
Update readme (#165)
1 parent 6532a41 commit 3de150a

File tree

2 files changed

+137
-100
lines changed

2 files changed

+137
-100
lines changed

README.md

Lines changed: 39 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,18 @@
22

33
<img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />
44

5-
Deep research has broken out as one of the most popular agent applications. This is a simple, configurable, fully open source deep research agent that works across many model providers, search tools, and MCP servers.
5+
Deep research has broken out as one of the most popular agent applications. This is a simple, configurable, fully open source deep research agent that works across many model providers, search tools, and MCP servers. It's performance is on par with many popular deep research agents ([see Deep Research Bench leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard)).
66

7-
* Read more in our [blog](https://blog.langchain.com/open-deep-research/)
8-
* See our [video](https://www.youtube.com/watch?v=agGiWUpxkhg) for a quick overview
7+
<img width="817" height="666" alt="Screenshot 2025-07-13 at 11 21 12 PM" src="https://github.com/user-attachments/assets/052f2ed3-c664-4a4f-8ec2-074349dcaa3f" />
98

109
### 🔥 Recent Updates
1110

1211
**August 2, 2025**: Achieved #6 ranking on the [Deep Research Bench Leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard) with an overall score of 0.4344.
1312

13+
**July 30, 2025**: Read about the evolution from our original implementations to the current version in our [blog post](https://rlancemartin.github.io/2025/07/30/bitter_lesson/).
14+
15+
**July 16, 2025**: Read more in our [blog](https://blog.langchain.com/open-deep-research/) and watch our [video](https://www.youtube.com/watch?v=agGiWUpxkhg) for a quick overview.
16+
1417
### 🚀 Quickstart
1518

1619
1. Clone the repository and activate a virtual environment:
@@ -33,141 +36,79 @@ uv pip install -r pyproject.toml
3336
cp .env.example .env
3437
```
3538

36-
4. Launch the assistant with the LangGraph server locally to open LangGraph Studio in your browser:
39+
4. Launch agent with the LangGraph server locally:
3740

3841
```bash
3942
# Install dependencies and start the LangGraph server
4043
uvx --refresh --from "langgraph-cli[inmem]" --with-editable . --python 3.11 langgraph dev --allow-blocking
4144
```
4245

43-
Use this to open the Studio UI:
46+
This will open the LangGraph Studio UI in your browser.
47+
4448
```
4549
- 🚀 API: http://127.0.0.1:2024
4650
- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
4751
- 📚 API Docs: http://127.0.0.1:2024/docs
4852
```
49-
<img width="817" height="666" alt="Screenshot 2025-07-13 at 11 21 12 PM" src="https://github.com/user-attachments/assets/052f2ed3-c664-4a4f-8ec2-074349dcaa3f" />
5053

51-
Ask a question in the `messages` input field and click `Submit`.
54+
Ask a question in the `messages` input field and click `Submit`. Select different configuration in the "Manage Assistants" tab.
5255

5356
### ⚙️ Configurations
5457

55-
Extensive configuration options to customize research behavior. Configure via web UI, environment variables, or direct modification.
56-
57-
#### General Settings
58-
59-
- **Max Structured Output Retries** (default: 3): Maximum number of retries for structured output calls from models when parsing fails
60-
- **Allow Clarification** (default: true): Whether to allow the researcher to ask clarifying questions before starting research
61-
- **Max Concurrent Research Units** (default: 5): Maximum number of research units to run concurrently using sub-agents. Higher values enable faster research but may hit rate limits
62-
63-
#### Research Configuration
64-
65-
- **Search API** (default: Tavily): Choose from Tavily (works with all models), OpenAI Native Web Search, Anthropic Native Web Search, or None
66-
- **Max Researcher Iterations** (default: 3): Number of times the Research Supervisor will reflect on research and ask follow-up questions
67-
- **Max React Tool Calls** (default: 5): Maximum number of tool calling iterations in a single researcher step
68-
69-
#### Models
70-
71-
Open Deep Research uses multiple specialized models for different research tasks:
72-
73-
- **Summarization Model** (default: `openai:gpt-4.1-mini`): Summarizes research results from search APIs
74-
- **Research Model** (default: `openai:gpt-4.1`): Conducts research and analysis
75-
- **Compression Model** (default: `openai:gpt-4.1`): Compresses research findings from sub-agents
76-
- **Final Report Model** (default: `openai:gpt-4.1`): Writes the final comprehensive report
77-
78-
All models are configured using [init_chat_model() API](https://python.langchain.com/docs/how_to/chat_models_universal_init/) which supports providers like OpenAI, Anthropic, Google Vertex AI, and others.
79-
80-
**Important Model Requirements:**
58+
#### LLM :brain:
8159

82-
1. **Structured Outputs**: All models must support structured outputs. Check support [here](https://python.langchain.com/docs/integrations/chat/).
60+
Open Deep Research supports a wide range of LLM providers via the [init_chat_model() API](https://python.langchain.com/docs/how_to/chat_models_universal_init/). It uses LLMs for a few different tasks. See the below model fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) file for more details. This can be accessed via the LangGraph Studio UI.
8361

84-
2. **Search API Compatibility**: Research and Compression models must support your selected search API:
85-
- Anthropic search requires Anthropic models with web search capability
86-
- OpenAI search requires OpenAI models with web search capability
87-
- Tavily works with all models
62+
- **Summarization** (default: `openai:gpt-4.1-mini`): Summarizes search API results
63+
- **Research** (default: `openai:gpt-4.1`): Power the search agent
64+
- **Compression** (default: `openai:gpt-4.1`): Compresses research findings
65+
- **Final Report Model** (default: `openai:gpt-4.1`): Write the final report
8866

89-
3. **Tool Calling**: All models must support tool calling functionality
67+
> Note: the selected model will need to support [structured outputs](https://python.langchain.com/docs/integrations/chat/) and [tool calling](https://python.langchain.com/docs/how_to/tool_calling/).
9068
91-
4. **Special Configurations**:
92-
- For OpenRouter: Follow [this guide](https://github.com/langchain-ai/open_deep_research/issues/75#issuecomment-2811472408)
93-
- For local models via Ollama: See [setup instructions](https://github.com/langchain-ai/open_deep_research/issues/65#issuecomment-2743586318)
69+
> Note: For OpenRouter: Follow [this guide](https://github.com/langchain-ai/open_deep_research/issues/75#issuecomment-2811472408) and for local models via Ollama see [setup instructions](https://github.com/langchain-ai/open_deep_research/issues/65#issuecomment-2743586318).
9470
95-
#### Example MCP (Model Context Protocol) Servers
71+
#### Search API :mag:
9672

97-
Open Deep Research supports MCP servers to extend research capabilities.
73+
Open Deep Research supports a wide range of search tools. By default it uses the [Tavily](https://www.tavily.com/) search API. Has full MCP compatibility and work native web search for Anthropic and OpenAI. See the `search_api` and `mcp_config` fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) file for more details. This can be accessed via the LangGraph Studio UI.
9874

99-
#### Local MCP Servers
75+
#### Other
10076

101-
**Filesystem MCP Server** provides secure file system operations with robust access control:
102-
- Read, write, and manage files and directories
103-
- Perform operations like reading file contents, creating directories, moving files, and searching
104-
- Restrict operations to predefined directories for security
105-
- Support for both command-line configuration and dynamic MCP roots
77+
See the fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) for various other settings to customize the behavior of Open Deep Research.
10678

107-
Example usage:
108-
```bash
109-
mcp-server-filesystem /path/to/allowed/dir1 /path/to/allowed/dir2
110-
```
111-
112-
#### Remote MCP Servers
113-
114-
**Remote MCP servers** enable distributed agent coordination and support streamable HTTP requests. Unlike local servers, they can be multi-tenant and require more complex authentication.
115-
116-
**Arcade MCP Server Example**:
117-
```json
118-
{
119-
"url": "https://api.arcade.dev/v1/mcps/ms_0ujssxh0cECutqzMgbtXSGnjorm",
120-
"tools": ["Search_SearchHotels", "Search_SearchOneWayFlights", "Search_SearchRoundtripFlights"]
121-
}
122-
```
79+
### 📊 Evaluation
12380

124-
Remote servers can be configured as authenticated or unauthenticated and support JWT-based authentication through OAuth endpoints.
81+
Open Deep Research is configured for evaluation with [Deep Research Bench](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard). This benchmark has 100 PhD-level research tasks (50 English, 50 Chinese), crafted by domain experts across 22 fields (e.g., Science & Tech, Business & Finance) to mirror real-world deep-research needs. It has 2 evaluation metrics, but the leaderboard is based on the RACE score. This uses LLM-as-a-judge (Gemini) to evaluate research reports against a golden set of reports compiled by experts across a set of metrics.
12582

126-
### 📊 Evaluation
83+
#### Usage
12784

128-
Comprehensive batch evaluation system for detailed analysis and comparative studies.
85+
> Warning: Running across the 100 examples can cost ~$20-$100 depending on the model selection.
12986
130-
#### **Features:**
131-
- **Multi-dimensional Scoring**: Specialized evaluators with 0-1 scale ratings
132-
- **Dataset-driven Evaluation**: Batch processing across multiple test cases
87+
The dataset is available on [LangSmith via this link](https://smith.langchain.com/public/c5e7a6ad-fdba-478c-88e6-3a388459ce8b/d). To kick off evaluation, run the following command:
13388

134-
#### **Usage:**
13589
```bash
13690
# Run comprehensive evaluation on LangSmith datasets
13791
python tests/run_evaluate.py
13892
```
13993

140-
#### **Deep Research Bench Submission:**
141-
The evaluation runs against the [Deep Research Bench](https://github.com/Ayanami0730/deep_research_bench), a comprehensive benchmark with 100 PhD-level research tasks across 22 fields.
142-
143-
To submit results to the benchmark:
94+
This will provide a link to a LangSmith experiment, which will have a name `YOUR_EXPERIMENT_NAME`. Once this is done, extract the results to a JSONL file that can be submitted to the Deep Research Bench.
14495

145-
1. **Run Evaluation**: Execute `python tests/run_evaluate.py` to evaluate against the Deep Research Bench dataset
146-
2. **Extract Results**: Use the extraction script to generate JSONL output:
147-
```bash
148-
python tests/extract_langsmith_data.py --project-name "YOUR_PROJECT_NAME" --model-name "gpt-4.1" --dataset-name "deep_research_bench"
149-
```
150-
This creates `tests/expt_results/deep_research_bench_gpt-4.1.jsonl` with the required format.
151-
3. **Submit to Benchmark**: Move the generated JSONL file to the Deep Research Bench repository and follow their [Quick Start guide](https://github.com/Ayanami0730/deep_research_bench?tab=readme-ov-file#quick-start) for evaluation submission
152-
153-
> **Note:** We submitted results from [this commit](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) to the Deep Research Bench, resulting in an overall score of 0.4344 (#6 on the leaderboard).
96+
```bash
97+
python tests/extract_langsmith_data.py --project-name "YOUR_EXPERIMENT_NAME" --model-name "you-model-name" --dataset-name "deep_research_bench"
98+
```
15499

155-
Results for current `main` branch utilize more constrained prompting to reduce token spend ~4x while still achieving a score of 0.4268.
100+
This creates `tests/expt_results/deep_research_bench_model-name.jsonl` with the required format. Move the generated JSONL file to a local clone of the Deep Research Bench repository and follow their [Quick Start guide](https://github.com/Ayanami0730/deep_research_bench?tab=readme-ov-file#quick-start) for evaluation submission.
156101

157-
#### **Current Results (Main Branch)**
102+
#### Results
158103

159-
| Metric | Score |
160-
|--------|-------|
161-
| Comprehensiveness | 0.4145 |
162-
| Insight | 0.3854 |
163-
| Instruction Following | 0.4780 |
164-
| Readability | 0.4495 |
165-
| **Overall Score** | **0.4268** |
104+
| Name | Commit | Summarization | Research | Compression | Total Cost | Total Tokens | RACE Score | Experiment |
105+
|------|--------|---------------|----------|-------------|------------|--------------|------------|------------|
106+
| Defaults | [6532a41](https://github.com/langchain-ai/open_deep_research/commit/6532a4176a93cc9bb2102b3d825dcefa560c85d9) | openai:gpt-4.1-mini | openai:gpt-4.1 | openai:gpt-4.1 | $45.98 | 58,015,332 | 0.4309 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=cf4355d7-6347-47e2-a774-484f290e79bc&baseline=undefined) |
107+
| Claude Sonnet 4 | [f877ea9](https://github.com/langchain-ai/open_deep_research/pull/163/commits/f877ea93641680879c420ea991e998b47aab9bcc) | openai:gpt-4.1-mini | anthropic:claude-sonnet-4-20250514 | openai:gpt-4.1 | $187.09 | 138,917,050 | 0.4401 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=04f6002d-6080-4759-bcf5-9a52e57449ea&baseline=undefined) |
108+
| Deep Research Bench Submission | [c0a160b](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) | openai:gpt-4.1-nano | openai:gpt-4.1 | openai:gpt-4.1 | $87.83 | 207,005,549 | 0.4344 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=e6647f74-ad2f-4cb9-887e-acb38b5f73c0&baseline=undefined) |
166109

167110
### 🚀 Deployments and Usage
168111

169-
Multiple deployment options for different use cases.
170-
171112
#### LangGraph Studio
172113

173114
Follow the [quickstart](#-quickstart) to start LangGraph server locally and test the agent out on LangGraph Studio.
@@ -188,9 +129,7 @@ You can also deploy your own instance of OAP, and make your own custom agents (l
188129

189130
### Legacy Implementations 🏛️
190131

191-
Read about the evolution from our original implementations to the current version in our [blog post](https://rlancemartin.github.io/2025/07/30/bitter_lesson/).
192-
193-
The `src/legacy/` folder contains two earlier implementations that provide alternative approaches to automated research:
132+
The `src/legacy/` folder contains two earlier implementations that provide alternative approaches to automated research. They are less performant than the current implementation, but provide alternative ideas understanding the different approaches to deep research.
194133

195134
#### 1. Workflow Implementation (`legacy/graph.py`)
196135
- **Plan-and-Execute**: Structured workflow with human-in-the-loop planning

0 commit comments

Comments
 (0)