langchain-ai
diff --git a/‎CLAUDE.md
Lines changed: 66 additions & 0 deletions b/‎CLAUDE.md
Lines changed: 66 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 45 additions & 16 deletions b/‎README.md
Lines changed: 45 additions & 16 deletions
diff --git a/‎pyproject.toml
Lines changed: 2 additions & 1 deletion b/‎pyproject.toml
Lines changed: 2 additions & 1 deletion
diff --git a/‎src/legacy/configuration.py
Lines changed: 8 additions & 8 deletions b/‎src/legacy/configuration.py
Lines changed: 8 additions & 8 deletions
diff --git a/‎src/open_deep_research/configuration.py
Lines changed: 35 additions & 11 deletions b/‎src/open_deep_research/configuration.py
Lines changed: 35 additions & 11 deletions
@@ -0,0 +1,66 @@
+# Open Deep Research Repository Overview
+
+## Project Description
+Open Deep Research is a configurable, fully open-source deep research agent that works across multiple model providers, search tools, and MCP (Model Context Protocol) servers. It enables automated research with parallel processing and comprehensive report generation.
+
+## Repository Structure
+
+### Root Directory
+- `README.md` - Comprehensive project documentation with quickstart guide
+- `pyproject.toml` - Python project configuration and dependencies
+- `langgraph.json` - LangGraph configuration defining the main graph entry point
+- `uv.lock` - UV package manager lock file
+- `LICENSE` - MIT license
+- `.env.example` - Environment variables template (not tracked)
+
+### Core Implementation (`src/open_deep_research/`)
+- `deep_researcher.py` - Main LangGraph implementation (entry point: `deep_researcher`)
+- `configuration.py` - Configuration management and settings
+- `state.py` - Graph state definitions and data structures  
+- `prompts.py` - System prompts and prompt templates
+- `utils.py` - Utility functions and helpers
+- `files/` - Research output and example files
+
+### Legacy Implementations (`src/legacy/`)
+Contains two earlier research implementations:
+- `graph.py` - Plan-and-execute workflow with human-in-the-loop
+- `multi_agent.py` - Supervisor-researcher multi-agent architecture
+- `legacy.md` - Documentation for legacy implementations
+- `CLAUDE.md` - Legacy-specific Claude instructions
+- `tests/` - Legacy-specific tests
+
+### Security (`src/security/`)
+- `auth.py` - Authentication handler for LangGraph deployment
+
+### Testing (`tests/`)
+- `run_evaluate.py` - Main evaluation script configured to run on deep research bench
+- `evaluators.py` - Specialized evaluation functions  
+- `prompts.py` - Evaluation prompts and criteria
+- `pairwise_evaluation.py` - Comparative evaluation tools
+- `supervisor_parallel_evaluation.py` - Multi-threaded evaluation
+
+### Examples (`examples/`)
+- `arxiv.md` - ArXiv research example
+- `pubmed.md` - PubMed research example
+- `inference-market.md` - Inference market analysis examples
+
+## Key Technologies
+- **LangGraph** - Workflow orchestration and graph execution
+- **LangChain** - LLM integration and tool calling
+- **Multiple LLM Providers** - OpenAI, Anthropic, Google, Groq, DeepSeek support
+- **Search APIs** - Tavily, OpenAI/Anthropic native search, DuckDuckGo, Exa
+- **MCP Servers** - Model Context Protocol for extended capabilities
+
+## Development Commands
+- `uvx langgraph dev` - Start development server with LangGraph Studio
+- `python tests/run_evaluate.py` - Run comprehensive evaluations
+- `ruff check` - Code linting
+- `mypy` - Type checking
+
+## Configuration
+All settings configurable via:
+- Environment variables (`.env` file)
+- Web UI in LangGraph Studio
+- Direct configuration modification
+
+Key settings include model selection, search API choice, concurrency limits, and MCP server configurations.
@@ -1,4 +1,4 @@
-# Open Deep Research
+# 🔬 Open Deep Research
 
 <img width="1388" height="298" alt="full_diagram" src="https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69" />
 
@@ -7,6 +7,10 @@ Deep research has broken out as one of the most popular agent applications. This
 * Read more in our [blog](https://blog.langchain.com/open-deep-research/) 
 * See our [video](https://www.youtube.com/watch?v=agGiWUpxkhg) for a quick overview
 
+### 🔥 Recent Updates
+
+**August 2, 2025**: Achieved #6 ranking on the [Deep Research Bench Leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard) with an overall score of 0.4344. 
+
 ### 🚀 Quickstart
 
 1. Clone the repository and activate a virtual environment:
@@ -19,6 +23,8 @@ source .venv/bin/activate  # On Windows: .venv\Scripts\activate
 
 2. Install dependencies:
 ```bash
+uv sync
+# or
 uv pip install -r pyproject.toml
 ```
 
@@ -44,9 +50,9 @@ Use this to open the Studio UI:
 
 Ask a question in the `messages` input field and click `Submit`.
 
-### Configurations
+### ⚙️ Configurations
 
-Open Deep Research offers extensive configuration options to customize the research process and model behavior. All configurations can be set via the web UI, environment variables, or by modifying the configuration directly.
+Extensive configuration options to customize research behavior. Configure via web UI, environment variables, or direct modification.
 
 #### General Settings
 
@@ -64,9 +70,9 @@ Open Deep Research offers extensive configuration options to customize the resea
 
 Open Deep Research uses multiple specialized models for different research tasks:
 
-- **Summarization Model** (default: `openai:gpt-4.1-nano`): Summarizes research results from search APIs
+- **Summarization Model** (default: `openai:gpt-4.1-mini`): Summarizes research results from search APIs
 - **Research Model** (default: `openai:gpt-4.1`): Conducts research and analysis 
-- **Compression Model** (default: `openai:gpt-4.1-mini`): Compresses research findings from sub-agents
+- **Compression Model** (default: `openai:gpt-4.1`): Compresses research findings from sub-agents
 - **Final Report Model** (default: `openai:gpt-4.1`): Writes the final comprehensive report
 
 All models are configured using [init_chat_model() API](https://python.langchain.com/docs/how_to/chat_models_universal_init/) which supports providers like OpenAI, Anthropic, Google Vertex AI, and others.
@@ -117,9 +123,9 @@ mcp-server-filesystem /path/to/allowed/dir1 /path/to/allowed/dir2
 
 Remote servers can be configured as authenticated or unauthenticated and support JWT-based authentication through OAuth endpoints.
 
-### Evaluation
+### 📊 Evaluation
 
-A comprehensive batch evaluation system designed for detailed analysis and comparative studies.
+Comprehensive batch evaluation system for detailed analysis and comparative studies.
 
 #### **Features:**
 - **Multi-dimensional Scoring**: Specialized evaluators with 0-1 scale ratings
@@ -130,12 +136,37 @@ A comprehensive batch evaluation system designed for detailed analysis and compa
 # Run comprehensive evaluation on LangSmith datasets
 python tests/run_evaluate.py
 ```
-#### **Key Files:**
-- `tests/run_evaluate.py`: Main evaluation script
-- `tests/evaluators.py`: Specialized evaluator functions
-- `tests/prompts.py`: Evaluation prompts for each dimension
 
-### Deployments and Usages
+#### **Deep Research Bench Submission:**
+The evaluation runs against the [Deep Research Bench](https://github.com/Ayanami0730/deep_research_bench), a comprehensive benchmark with 100 PhD-level research tasks across 22 fields.
+
+To submit results to the benchmark:
+
+1. **Run Evaluation**: Execute `python tests/run_evaluate.py` to evaluate against the Deep Research Bench dataset
+2. **Extract Results**: Use the extraction script to generate JSONL output:
+   ```bash
+   python tests/extract_langsmith_data.py --project-name "YOUR_PROJECT_NAME" --model-name "gpt-4.1" --dataset-name "deep_research_bench"
+   ```
+   This creates `tests/expt_results/deep_research_bench_gpt-4.1.jsonl` with the required format.
+3. **Submit to Benchmark**: Move the generated JSONL file to the Deep Research Bench repository and follow their [Quick Start guide](https://github.com/Ayanami0730/deep_research_bench?tab=readme-ov-file#quick-start) for evaluation submission
+
+> **Note:** We submitted results from [this commit](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) to the Deep Research Bench, resulting in an overall score of 0.4344 (#6 on the leaderboard).
+
+Results for current `main` branch utilize more constrained prompting to reduce token spend ~4x while still achieving a score of 0.4268. 
+
+#### **Current Results (Main Branch)**
+
+| Metric | Score |
+|--------|-------|
+| Comprehensiveness | 0.4145 |
+| Insight | 0.3854 |
+| Instruction Following | 0.4780 |
+| Readability | 0.4495 |
+| **Overall Score** | **0.4268** |
+
+### 🚀 Deployments and Usage
+
+Multiple deployment options for different use cases.
 
 #### LangGraph Studio
 
@@ -155,10 +186,10 @@ You can also deploy your own instance of OAP, and make your own custom agents (l
 1. [Deploy Open Agent Platform](https://docs.oap.langchain.com/quickstart)
 2. [Add Deep Researcher to OAP](https://docs.oap.langchain.com/setup/agents)
 
-### Updates 🔥
-
 ### Legacy Implementations 🏛️
 
+Read about the evolution from our original implementations to the current version in our [blog post](https://rlancemartin.github.io/2025/07/30/bitter_lesson/).
+
 The `src/legacy/` folder contains two earlier implementations that provide alternative approaches to automated research:
 
 #### 1. Workflow Implementation (`legacy/graph.py`)
@@ -172,5 +203,3 @@ The `src/legacy/` folder contains two earlier implementations that provide alter
 - **Parallel Processing**: Multiple researchers work simultaneously
 - **Speed Optimized**: Faster report generation through concurrency
 - **MCP Support**: Extensive Model Context Protocol integration
-
-See `src/legacy/legacy.md` for detailed documentation, configuration options, and usage examples for both legacy implementations.
 
@@ -9,7 +9,7 @@ readme = "README.md"
 license = { text = "MIT" }
 requires-python = ">=3.10"
 dependencies = [
-    "langgraph>=0.5.3",
+    "langgraph>=0.5.4",
     "langchain-community>=0.3.9",
     "langchain-openai>=0.3.7",
     "langchain-anthropic>=0.3.15",
@@ -42,6 +42,7 @@ dependencies = [
     "ipykernel>=6.29.5",
     "supabase>=2.15.3",
     "mcp>=1.9.4",
+    "pandas>=2.3.1",
 ]
 
 [project.optional-dependencies]
 
@@ -36,8 +36,8 @@ class Configuration:
     search_api: SearchAPI = SearchAPI.TAVILY
     search_api_config: Optional[Dict[str, Any]] = None
     process_search_results: Literal["summarize", "split_and_rerank"] | None = None
-    summarization_model_provider: str = "anthropic"
-    summarization_model: str = "claude-3-5-haiku-latest"
+    summarization_model_provider: str = "openai"
+    summarization_model: str = "gpt-4.1"
     max_structured_output_retries: int = 3
     include_source_str: bool = False
 
@@ -47,8 +47,8 @@ class Configuration:
     planner_provider: str = "anthropic"
     planner_model: str = "claude-3-7-sonnet-latest"
     planner_model_kwargs: Optional[Dict[str, Any]] = None
-    writer_provider: str = "anthropic"
-    writer_model: str = "claude-3-7-sonnet-latest"
+    writer_provider: str = "openai"
+    writer_model: str = "gpt-4.1"
     writer_model_kwargs: Optional[Dict[str, Any]] = None
 
     @classmethod
@@ -73,14 +73,14 @@ class MultiAgentConfiguration:
     search_api: SearchAPI = SearchAPI.TAVILY
     search_api_config: Optional[Dict[str, Any]] = None
     process_search_results: Literal["summarize", "split_and_rerank"] | None = None
-    summarization_model_provider: str = "anthropic"
-    summarization_model: str = "claude-3-5-haiku-latest"
+    summarization_model_provider: str = "openai"
+    summarization_model: str = "gpt-4.1"
     include_source_str: bool = False
 
     # Multi-agent specific configuration
     number_of_queries: int = 2 # Number of search queries to generate per section
-    supervisor_model: str = "anthropic:claude-3-7-sonnet-latest"
-    researcher_model: str = "anthropic:claude-3-7-sonnet-latest"
+    supervisor_model: str = "anthropic:claude-sonnet-4-20250514"
+    researcher_model: str = "anthropic:claude-sonnet-4-20250514"
     ask_for_clarification: bool = False # Whether to ask for clarification from the user
     # MCP server configuration
     mcp_server_config: Optional[Dict[str, Any]] = None
 
@@ -1,16 +1,24 @@
-from pydantic import BaseModel, Field
-from typing import Any, List, Optional
-from langchain_core.runnables import RunnableConfig
+"""Configuration management for the Open Deep Research system."""
+
 import os
 from enum import Enum
+from typing import Any, List, Optional
+
+from langchain_core.runnables import RunnableConfig
+from pydantic import BaseModel, Field
+
 
 class SearchAPI(Enum):
+    """Enumeration of available search API providers."""
+    
     ANTHROPIC = "anthropic"
     OPENAI = "openai"
     TAVILY = "tavily"
     NONE = "none"
 
 class MCPConfig(BaseModel):
+    """Configuration for Model Context Protocol (MCP) servers."""
+    
     url: Optional[str] = Field(
         default=None,
         optional=True,
@@ -28,6 +36,8 @@ class MCPConfig(BaseModel):
     """Whether the MCP server requires authentication"""
 
 class Configuration(BaseModel):
+    """Main configuration class for the Deep Research agent."""
+    
     # General Configuration
     max_structured_output_retries: int = Field(
         default=3,
@@ -82,11 +92,11 @@ class Configuration(BaseModel):
         }
     )
     max_researcher_iterations: int = Field(
-        default=3,
+        default=6,
         metadata={
             "x_oap_ui_config": {
                 "type": "slider",
-                "default": 3,
+                "default": 6,
                 "min": 1,
                 "max": 10,
                 "step": 1,
@@ -95,11 +105,11 @@ class Configuration(BaseModel):
         }
     )
     max_react_tool_calls: int = Field(
-        default=5,
+        default=10,
         metadata={
             "x_oap_ui_config": {
                 "type": "slider",
-                "default": 5,
+                "default": 10,
                 "min": 1,
                 "max": 30,
                 "step": 1,
@@ -109,11 +119,11 @@ class Configuration(BaseModel):
     )
     # Model Configuration
     summarization_model: str = Field(
-        default="openai:gpt-4.1-nano",
+        default="openai:gpt-4.1-mini",
         metadata={
             "x_oap_ui_config": {
                 "type": "text",
-                "default": "openai:gpt-4.1-nano",
+                "default": "openai:gpt-4.1-mini",
                 "description": "Model for summarizing research results from Tavily search results"
             }
         }
@@ -128,6 +138,18 @@ class Configuration(BaseModel):
             }
         }
     )
+    max_content_length: int = Field(
+        default=50000,
+        metadata={
+            "x_oap_ui_config": {
+                "type": "number",
+                "default": 50000,
+                "min": 1000,
+                "max": 200000,
+                "description": "Maximum character length for webpage content before summarization"
+            }
+        }
+    )
     research_model: str = Field(
         default="openai:gpt-4.1",
         metadata={
@@ -149,11 +171,11 @@ class Configuration(BaseModel):
         }
     )
     compression_model: str = Field(
-        default="openai:gpt-4.1-mini",
+        default="openai:gpt-4.1",
         metadata={
             "x_oap_ui_config": {
                 "type": "text",
-                "default": "openai:gpt-4.1-mini",
+                "default": "openai:gpt-4.1",
                 "description": "Model for compressing research findings from sub-agents. NOTE: Make sure your Compression Model supports the selected search API."
             }
         }
@@ -225,4 +247,6 @@ def from_runnable_config(
         return cls(**{k: v for k, v in values.items() if v is not None})
 
     class Config:
+        """Pydantic configuration."""
+        
         arbitrary_types_allowed = True