diff --git a/.gitignore b/.gitignore
index 851a4f94..f27e166a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -170,3 +170,7 @@ nohup.out
# Claude
.claude/
+
+# design folder
+design/
+deep_research/design
diff --git a/deep_research/README.md b/deep_research/README.md
new file mode 100644
index 00000000..c057d85b
--- /dev/null
+++ b/deep_research/README.md
@@ -0,0 +1,604 @@
+# 🔍 ZenML Deep Research Agent
+
+A production-ready MLOps pipeline for conducting deep, comprehensive research on any topic using LLMs and web search capabilities.
+
+
+
+
ZenML Deep Research pipeline flow
+
+
+## 🎯 Overview
+
+The ZenML Deep Research Agent is a scalable, modular pipeline that automates in-depth research on any topic. It:
+
+- Creates a structured outline based on your research query
+- Researches each section through targeted web searches and LLM analysis
+- Iteratively refines content through reflection cycles
+- Produces a comprehensive, well-formatted research report
+- Visualizes the research process and report structure in the ZenML dashboard
+
+This project transforms exploratory notebook-based research into a production-grade, reproducible, and transparent process using the ZenML MLOps framework.
+
+## 📝 Example Research Results
+
+The Deep Research Agent produces comprehensive, well-structured reports on any topic. Here's an example of research conducted on quantum computing:
+
+
+
+
Sample report generated by the Deep Research Agent
+
+
+## 🚀 Pipeline Architecture
+
+The pipeline uses a parallel processing architecture for efficiency and breaks down the research process into granular steps for maximum modularity and control:
+
+1. **Initialize Prompts**: Load and track all prompts as versioned artifacts
+2. **Query Decomposition**: Break down the main query into specific sub-questions
+3. **Parallel Information Gathering**: Process multiple sub-questions concurrently for faster results
+4. **Merge Results**: Combine results from parallel processing into a unified state
+5. **Cross-Viewpoint Analysis**: Analyze discrepancies and agreements between different perspectives
+6. **Reflection Generation**: Generate recommendations for improving research quality
+7. **Human Approval** (optional): Get human approval for additional searches
+8. **Execute Approved Searches**: Perform approved additional searches to fill gaps
+9. **Final Report Generation**: Compile all synthesized information into a coherent HTML report
+10. **Collect Tracing Metadata**: Gather comprehensive metrics about token usage, costs, and performance
+
+This architecture enables:
+- Better reproducibility and caching of intermediate results
+- Parallel processing for faster research completion
+- Easier debugging and monitoring of specific research stages
+- More flexible reconfiguration of individual components
+- Enhanced transparency into how the research is conducted
+- Human oversight and control over iterative research expansions
+
+## 💡 Under the Hood
+
+- **LLM Integration**: Uses litellm for flexible access to various LLM providers
+- **Web Research**: Utilizes Tavily API for targeted internet searches
+- **ZenML Orchestration**: Manages pipeline flow, artifacts, and caching
+- **Reproducibility**: Track every step, parameter, and output via ZenML
+- **Visualizations**: Interactive visualizations of the research structure and progress
+- **Report Generation**: Uses static HTML templates for consistent, high-quality reports
+- **Human-in-the-Loop**: Optional approval mechanism via ZenML alerters (Discord, Slack, etc.)
+- **LLM Observability**: Integrated Langfuse tracking for monitoring LLM usage, costs, and performance
+
+## 🛠️ Getting Started
+
+### Prerequisites
+
+- Python 3.9+
+- ZenML installed and configured
+- API key for your preferred LLM provider (configured with litellm)
+- Tavily API key
+- Langfuse account for LLM tracking (optional but recommended)
+
+### Installation
+
+```bash
+# Clone the repository
+git clone
+cd zenml_deep_research
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Set up API keys
+export OPENAI_API_KEY=your_openai_key # Or another LLM provider key
+export TAVILY_API_KEY=your_tavily_key # For Tavily search (default)
+export EXA_API_KEY=your_exa_key # For Exa search (optional)
+
+# Set up Langfuse for LLM tracking (optional)
+export LANGFUSE_PUBLIC_KEY=your_public_key
+export LANGFUSE_SECRET_KEY=your_secret_key
+export LANGFUSE_HOST=https://cloud.langfuse.com # Or your self-hosted URL
+
+# Initialize ZenML (if needed)
+zenml init
+```
+
+### Setting up Langfuse for LLM Tracking
+
+The pipeline integrates with [Langfuse](https://langfuse.com) for comprehensive LLM observability and tracking. This allows you to monitor LLM usage, costs, and performance across all pipeline runs.
+
+#### 1. Create a Langfuse Account
+
+1. Sign up at [cloud.langfuse.com](https://cloud.langfuse.com) or set up a self-hosted instance
+2. Create a new project in your Langfuse dashboard (e.g., "deep-research")
+3. Navigate to Settings → API Keys to get your credentials
+
+#### 2. Configure Environment Variables
+
+Set the following environment variables with your Langfuse credentials:
+
+```bash
+export LANGFUSE_PUBLIC_KEY=pk-lf-... # Your public key
+export LANGFUSE_SECRET_KEY=sk-lf-... # Your secret key
+export LANGFUSE_HOST=https://cloud.langfuse.com # Or your self-hosted URL
+```
+
+#### 3. Configure Project Name
+
+The Langfuse project name can be configured in any of the pipeline configuration files:
+
+```yaml
+# configs/enhanced_research.yaml
+langfuse_project_name: "deep-research" # Change to match your Langfuse project
+```
+
+**Note**: The project must already exist in your Langfuse dashboard before running the pipeline.
+
+#### What Gets Tracked
+
+When Langfuse is configured, the pipeline automatically tracks:
+
+- **All LLM calls** with their prompts, responses, and token usage
+- **Pipeline trace information** including:
+ - `trace_name`: The ZenML pipeline run name for easy identification
+ - `trace_id`: The unique ZenML pipeline run ID for correlation
+- **Tagged operations** such as:
+ - `structured_llm_output`: JSON generation calls
+ - `information_synthesis`: Research synthesis operations
+ - `find_most_relevant_string`: Relevance matching operations
+- **Performance metrics**: Latency, token counts, and costs
+- **Project organization**: All traces are organized under your configured project
+
+This integration provides full observability into your research pipeline's LLM usage, making it easy to optimize performance, track costs, and debug issues.
+
+### Running the Pipeline
+
+#### Basic Usage
+
+```bash
+# Run with default configuration
+python run.py
+```
+
+The default configuration and research query are defined in `configs/enhanced_research.yaml`.
+
+#### Using Research Mode Presets
+
+The pipeline includes three pre-configured research modes for different use cases:
+
+```bash
+# Rapid mode - Quick overview with minimal depth
+python run.py --mode rapid
+
+# Balanced mode - Standard research depth (default)
+python run.py --mode balanced
+
+# Deep mode - Comprehensive analysis with maximum depth
+python run.py --mode deep
+```
+
+**Mode Comparison:**
+
+| Mode | Sub-Questions | Search Results* | Additional Searches | Best For |
+|------|---------------|----------------|-------------------|----------|
+| **Rapid** | 5 | 2 per search | 0 | Quick overviews, time-sensitive research |
+| **Balanced** | 10 | 3 per search | 2 | Most research tasks, good depth/speed ratio |
+| **Deep** | 15 | 5 per search | 4 | Comprehensive analysis, academic research |
+
+*Can be overridden with `--num-results`
+
+#### Using Different Configurations
+
+```bash
+# Run with a custom configuration file
+python run.py --config configs/custom_enhanced_config.yaml
+
+# Override the research query from command line
+python run.py --query "My research topic"
+
+# Specify maximum number of sub-questions to process in parallel
+python run.py --max-sub-questions 15
+
+# Combine mode with other options
+python run.py --mode deep --query "Complex topic" --require-approval
+
+# Combine multiple options
+python run.py --config configs/custom_enhanced_config.yaml --query "My research topic" --max-sub-questions 12
+```
+
+### Advanced Options
+
+```bash
+# Enable debug logging
+python run.py --debug
+
+# Disable caching for a fresh run
+python run.py --no-cache
+
+# Specify a log file
+python run.py --log-file research.log
+
+# Enable human-in-the-loop approval for additional research
+python run.py --require-approval
+
+# Set approval timeout (in seconds)
+python run.py --require-approval --approval-timeout 7200
+
+# Use a different search provider (default: tavily)
+python run.py --search-provider exa # Use Exa search
+python run.py --search-provider both # Use both providers
+python run.py --search-provider exa --search-mode neural # Exa with neural search
+
+# Control the number of search results per query
+python run.py --num-results 5 # Get 5 results per search
+python run.py --num-results 10 --search-provider exa # 10 results with Exa
+```
+
+### Search Providers
+
+The pipeline supports multiple search providers for flexibility and comparison:
+
+#### Available Providers
+
+1. **Tavily** (Default)
+ - Traditional keyword-based search
+ - Good for factual information and current events
+ - Requires `TAVILY_API_KEY` environment variable
+
+2. **Exa**
+ - Neural search engine with semantic understanding
+ - Better for conceptual and research-oriented queries
+ - Supports three search modes:
+ - `auto` (default): Automatically chooses between neural and keyword
+ - `neural`: Semantic search for conceptual understanding
+ - `keyword`: Traditional keyword matching
+ - Requires `EXA_API_KEY` environment variable
+
+3. **Both**
+ - Runs searches on both providers
+ - Useful for comprehensive research or comparing results
+ - Requires both API keys
+
+#### Usage Examples
+
+```bash
+# Use Exa with neural search
+python run.py --search-provider exa --search-mode neural
+
+# Compare results from both providers
+python run.py --search-provider both
+
+# Use Exa with keyword search for exact matches
+python run.py --search-provider exa --search-mode keyword
+
+# Combine with other options
+python run.py --mode deep --search-provider exa --require-approval
+```
+
+### Human-in-the-Loop Approval
+
+The pipeline supports human approval for additional research queries identified during the reflection phase:
+
+```bash
+# Enable approval with default 1-hour timeout
+python run.py --require-approval
+
+# Custom timeout (2 hours)
+python run.py --require-approval --approval-timeout 7200
+
+# Approval works with any configuration
+python run.py --config configs/thorough_research.yaml --require-approval
+```
+
+When enabled, the pipeline will:
+1. Pause after the initial research phase
+2. Send an approval request via your configured ZenML alerter (Discord, Slack, etc.)
+3. Present research progress, identified gaps, and proposed additional queries
+4. Wait for your approval before conducting additional searches
+5. Continue with approved queries or finalize the report based on your decision
+
+**Note**: You need a ZenML stack with an alerter configured (e.g., Discord or Slack) for approval functionality to work.
+
+**Tip**: When using `--mode deep`, the pipeline will suggest enabling `--require-approval` for better control over the comprehensive research process.
+
+## 📊 Visualizing Research Process
+
+The pipeline includes built-in visualizations to help you understand and monitor the research process:
+
+### Viewing Visualizations
+
+After running the pipeline, you can view the visualizations in the ZenML dashboard:
+
+1. Start the ZenML dashboard:
+ ```bash
+ zenml up
+ ```
+
+2. Navigate to the "Runs" tab in the dashboard
+3. Select your pipeline run
+4. Explore visualizations for each step:
+ - **initialize_prompts_step**: View all prompts used in the pipeline
+ - **initial_query_decomposition_step**: See how the query was broken down
+ - **process_sub_question_step**: Track progress for each sub-question
+ - **cross_viewpoint_analysis_step**: View viewpoint analysis results
+ - **generate_reflection_step**: See reflection and recommendations
+ - **get_research_approval_step**: View approval decisions
+ - **pydantic_final_report_step**: Access the final research state
+ - **collect_tracing_metadata_step**: View comprehensive cost and performance metrics
+
+### Visualization Features
+
+The visualizations provide:
+- An overview of the report structure
+- Details of each paragraph's research status
+- Search history and source information
+- Progress through reflection iterations
+- Professionally formatted HTML reports with static templates
+
+### Sample Visualization
+
+Here's what the report structure visualization looks like:
+
+```
+Report Structure:
+├── Introduction
+│ └── Initial understanding of the topic
+├── Historical Background
+│ └── Evolution and key developments
+├── Current State
+│ └── Latest advancements and implementations
+└── Conclusion
+ └── Summary and future implications
+```
+
+## 📁 Project Structure
+
+```
+zenml_deep_research/
+├── configs/ # Configuration files
+│ ├── __init__.py
+│ └── enhanced_research.yaml # Main configuration file
+├── materializers/ # Custom materializers for artifact storage
+│ ├── __init__.py
+│ └── pydantic_materializer.py
+├── pipelines/ # ZenML pipeline definitions
+│ ├── __init__.py
+│ └── parallel_research_pipeline.py
+├── steps/ # ZenML pipeline steps
+│ ├── __init__.py
+│ ├── approval_step.py # Human approval step for additional research
+│ ├── cross_viewpoint_step.py
+│ ├── execute_approved_searches_step.py # Execute approved searches
+│ ├── generate_reflection_step.py # Generate reflection without execution
+│ ├── iterative_reflection_step.py # Legacy combined reflection step
+│ ├── merge_results_step.py
+│ ├── process_sub_question_step.py
+│ ├── pydantic_final_report_step.py
+│ └── query_decomposition_step.py
+├── utils/ # Utility functions and helpers
+│ ├── __init__.py
+│ ├── approval_utils.py # Human approval utilities
+│ ├── helper_functions.py
+│ ├── llm_utils.py # LLM integration utilities
+│ ├── prompts.py # Contains prompt templates and HTML templates
+│ ├── pydantic_models.py # Data models using Pydantic
+│ └── search_utils.py # Web search functionality
+├── __init__.py
+├── requirements.txt # Project dependencies
+├── logging_config.py # Logging configuration
+├── README.md # Project documentation
+└── run.py # Main script to run the pipeline
+```
+
+## 🔧 Customization
+
+The project supports two levels of customization:
+
+### 1. Command-Line Parameters
+
+You can customize the research behavior directly through command-line parameters:
+
+```bash
+# Specify your research query
+python run.py --query "Your research topic"
+
+# Control parallelism with max-sub-questions
+python run.py --max-sub-questions 15
+
+# Combine multiple options
+python run.py --query "Your research topic" --max-sub-questions 12 --no-cache
+```
+
+These settings control how the parallel pipeline processes your research query.
+
+### 2. Pipeline Configuration
+
+For more detailed settings, modify the configuration file:
+
+```yaml
+# configs/enhanced_research.yaml
+
+# Enhanced Deep Research Pipeline Configuration
+enable_cache: true
+
+# Research query parameters
+query: "Climate change policy debates"
+
+# Step configurations
+steps:
+ initial_query_decomposition_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ cross_viewpoint_analysis_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ viewpoint_categories: ["scientific", "political", "economic", "social", "ethical", "historical"]
+
+ iterative_reflection_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ max_additional_searches: 2
+ num_results_per_search: 3
+
+ # Human approval configuration (when using --require-approval)
+ get_research_approval_step:
+ parameters:
+ timeout: 3600 # 1 hour timeout for approval
+ max_queries: 2 # Maximum queries to present for approval
+
+ pydantic_final_report_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+# Environment settings
+settings:
+ docker:
+ requirements:
+ - openai>=1.0.0
+ - tavily-python>=0.2.8
+ - PyYAML>=6.0
+ - click>=8.0.0
+ - pydantic>=2.0.0
+ - typing_extensions>=4.0.0
+```
+
+To use a custom configuration file:
+
+```bash
+python run.py --config configs/custom_research.yaml
+```
+
+### Available Configurations
+
+**Mode-Based Configurations** (automatically selected when using `--mode`):
+
+| Config File | Mode | Description |
+|-------------|------|-------------|
+| `rapid_research.yaml` | `--mode rapid` | Quick overview with minimal depth |
+| `balanced_research.yaml` | `--mode balanced` | Standard research with moderate depth |
+| `deep_research.yaml` | `--mode deep` | Comprehensive analysis with maximum depth |
+
+**Specialized Configurations:**
+
+| Config File | Description | Key Parameters |
+|-------------|-------------|----------------|
+| `enhanced_research.yaml` | Default research configuration | Standard settings, 2 additional searches |
+| `thorough_research.yaml` | In-depth analysis | 12 sub-questions, 5 results per search |
+| `quick_research.yaml` | Faster results | 5 sub-questions, 2 results per search |
+| `daily_trends.yaml` | Research on recent topics | 24-hour search recency, disable cache |
+| `compare_viewpoints.yaml` | Focus on comparing perspectives | Extended viewpoint categories |
+| `parallel_research.yaml` | Optimized for parallel execution | Configured for distributed orchestrators |
+
+You can create additional configuration files by copying and modifying the base configuration files above.
+
+## 🎯 Prompts Tracking and Management
+
+The pipeline includes a sophisticated prompts tracking system that allows you to track all prompts as versioned artifacts in ZenML. This provides better observability, version control, and visualization of the prompts used in your research pipeline.
+
+### Overview
+
+The prompts tracking system enables:
+- **Artifact Tracking**: All prompts are tracked as versioned artifacts in ZenML
+- **Beautiful Visualizations**: HTML interface in the dashboard with search, copy, and expand features
+- **Version Control**: Prompts are versioned alongside your code
+- **Pipeline Integration**: Prompts are passed through the pipeline as artifacts, not hardcoded imports
+
+### Components
+
+1. **PromptsBundle Model** (`utils/prompt_models.py`)
+ - Pydantic model containing all prompts used in the pipeline
+ - Each prompt includes metadata: name, content, description, version, and tags
+
+2. **PromptsBundleMaterializer** (`materializers/prompts_materializer.py`)
+ - Custom materializer creating HTML visualizations in the ZenML dashboard
+ - Features: search, copy-to-clipboard, expandable content, tag categorization
+
+3. **Prompt Loader** (`utils/prompt_loader.py`)
+ - Utility to load prompts from `prompts.py` into a PromptsBundle
+
+### Integration Guide
+
+To integrate prompts tracking into a pipeline:
+
+1. **Initialize prompts as the first step:**
+ ```python
+ from steps.initialize_prompts_step import initialize_prompts_step
+
+ @pipeline
+ def my_pipeline():
+ prompts_bundle = initialize_prompts_step(pipeline_version="1.0.0")
+ ```
+
+2. **Update steps to receive prompts_bundle:**
+ ```python
+ @step
+ def my_step(state: ResearchState, prompts_bundle: PromptsBundle):
+ prompt = prompts_bundle.get_prompt_content("synthesis_prompt")
+ # Use prompt in your step logic
+ ```
+
+3. **Pass prompts_bundle through the pipeline:**
+ ```python
+ state = synthesis_step(state=state, prompts_bundle=prompts_bundle)
+ ```
+
+### Benefits
+
+- **Full Tracking**: Every pipeline run tracks which exact prompts were used
+- **Version History**: See how prompts evolved across different runs
+- **Debugging**: Easily identify which prompts produced specific outputs
+- **A/B Testing**: Compare results using different prompt versions
+
+### Visualization Features
+
+The HTML visualization in the ZenML dashboard includes:
+- Pipeline version and creation timestamp
+- Statistics (total prompts, tagged prompts, custom prompts)
+- Search functionality across all prompt content
+- Expandable/collapsible prompt content
+- One-click copy to clipboard
+- Tag-based categorization with visual indicators
+
+## 📊 Cost and Performance Tracking
+
+The pipeline includes comprehensive tracking of costs and performance metrics through the `collect_tracing_metadata_step`, which runs at the end of each pipeline execution.
+
+### Tracked Metrics
+
+- **LLM Costs**: Detailed breakdown by model and prompt type
+- **Search Costs**: Tracking for both Tavily and Exa search providers
+- **Token Usage**: Input/output tokens per model and step
+- **Performance**: Latency and execution time metrics
+- **Cost Attribution**: See which steps and prompts consume the most resources
+
+### Viewing Metrics
+
+After pipeline execution, the tracing metadata is available in the ZenML dashboard:
+
+1. Navigate to your pipeline run
+2. Find the `collect_tracing_metadata_step`
+3. View the comprehensive cost visualization including:
+ - Total pipeline cost (LLM + Search)
+ - Cost breakdown by model
+ - Token usage distribution
+ - Performance metrics
+
+This helps you:
+- Optimize pipeline costs by identifying expensive operations
+- Monitor token usage to stay within limits
+- Track performance over time
+- Make informed decisions about model selection
+
+## 📈 Example Use Cases
+
+- **Academic Research**: Rapidly generate preliminary research on academic topics
+- **Business Intelligence**: Stay informed on industry trends and competitive landscape
+- **Content Creation**: Develop well-researched content for articles, blogs, or reports
+- **Decision Support**: Gather comprehensive information for informed decision-making
+
+## 🔄 Integration Possibilities
+
+This pipeline can integrate with:
+
+- **Document Storage**: Save reports to database or document management systems
+- **Web Applications**: Power research functionality in web interfaces
+- **Alerting Systems**: Schedule research on key topics and receive regular reports
+- **Other ZenML Pipelines**: Chain with downstream analysis or processing
+
+## 📄 License
+
+This project is licensed under the Apache License 2.0.
diff --git a/deep_research/__init__.py b/deep_research/__init__.py
new file mode 100644
index 00000000..e69de29b
diff --git a/deep_research/assets/styles.css b/deep_research/assets/styles.css
new file mode 100644
index 00000000..c0e5523a
--- /dev/null
+++ b/deep_research/assets/styles.css
@@ -0,0 +1,692 @@
+/* ===================================
+ Deep Research Pipeline Global Styles
+ =================================== */
+
+/* 1. CSS Variables / Custom Properties */
+:root {
+ /* Color Palette - ZenML Design System */
+ --color-primary: #7a3ef4;
+ --color-primary-dark: #6b35db;
+ --color-primary-light: #9d6ff7;
+ --color-secondary: #667eea;
+ --color-secondary-dark: #5a63d8;
+ --color-accent: #764ba2;
+
+ /* Status Colors - ZenML Semantic Colors */
+ --color-success: #179f3e;
+ --color-success-light: #d4edda;
+ --color-success-dark: #155724;
+ --color-warning: #a65d07;
+ --color-warning-light: #fff3cd;
+ --color-warning-dark: #856404;
+ --color-danger: #dc3545;
+ --color-danger-light: #f8d7da;
+ --color-danger-dark: #721c24;
+ --color-info: #007bff;
+ --color-info-light: #d1ecf1;
+ --color-info-dark: #004085;
+
+ /* Chart Colors - ZenML Palette */
+ --color-chart-1: #7a3ef4;
+ --color-chart-2: #179f3e;
+ --color-chart-3: #007bff;
+ --color-chart-4: #dc3545;
+ --color-chart-5: #a65d07;
+ --color-chart-6: #6c757d;
+
+ /* Neutrals */
+ --color-text-primary: #333;
+ --color-text-secondary: #666;
+ --color-text-muted: #999;
+ --color-text-light: #7f8c8d;
+ --color-heading: #2c3e50;
+ --color-bg-primary: #f5f7fa;
+ --color-bg-secondary: #f8f9fa;
+ --color-bg-light: #f0f2f5;
+ --color-bg-white: #ffffff;
+ --color-border: #e9ecef;
+ --color-border-light: #dee2e6;
+ --color-border-dark: #ddd;
+
+ /* Typography - ZenML Font Stack */
+ --font-family-base: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;
+ --font-family-mono: 'Monaco', 'Menlo', 'Ubuntu Mono', 'Consolas', 'source-code-pro', monospace;
+
+ /* Spacing - ZenML 8px Grid System */
+ --spacing-xs: 4px;
+ --spacing-sm: 8px;
+ --spacing-md: 16px;
+ --spacing-lg: 24px;
+ --spacing-xl: 32px;
+ --spacing-xxl: 48px;
+
+ /* Border Radius - ZenML Subtle Corners */
+ --radius-sm: 4px;
+ --radius-md: 6px;
+ --radius-lg: 8px;
+ --radius-xl: 12px;
+ --radius-round: 50%;
+
+ /* Shadows - ZenML Subtle Shadows */
+ --shadow-sm: 0 1px 2px rgba(0, 0, 0, 0.05);
+ --shadow-md: 0 4px 12px rgba(0, 0, 0, 0.1);
+ --shadow-lg: 0 8px 24px rgba(0, 0, 0, 0.12);
+ --shadow-xl: 0 12px 48px rgba(0, 0, 0, 0.15);
+ --shadow-hover: 0 6px 16px rgba(0, 0, 0, 0.1);
+ --shadow-hover-lg: 0 8px 24px rgba(0, 0, 0, 0.15);
+
+ /* Transitions */
+ --transition-base: all 0.3s ease;
+ --transition-fast: all 0.2s ease;
+}
+
+/* 2. Base Styles */
+* {
+ box-sizing: border-box;
+}
+
+body {
+ font-family: var(--font-family-base);
+ font-size: 14px;
+ line-height: 1.6;
+ color: var(--color-text-primary);
+ background-color: var(--color-bg-primary);
+ margin: 0;
+ padding: var(--spacing-md);
+ -webkit-font-smoothing: antialiased;
+ -moz-osx-font-smoothing: grayscale;
+}
+
+/* 3. Layout Components */
+.dr-container {
+ max-width: 1200px;
+ margin: 0 auto;
+ padding: var(--spacing-md);
+}
+
+.dr-container--wide {
+ max-width: 1400px;
+}
+
+.dr-container--narrow {
+ max-width: 900px;
+}
+
+/* 4. Typography */
+.dr-h1, h1 {
+ color: var(--color-heading);
+ font-size: 2em;
+ font-weight: 500;
+ margin: 0 0 var(--spacing-lg) 0;
+ padding-bottom: var(--spacing-sm);
+ border-bottom: 2px solid var(--color-primary);
+}
+
+.dr-h1--no-border {
+ border-bottom: none;
+ padding-bottom: 0;
+}
+
+.dr-h2, h2 {
+ color: var(--color-heading);
+ font-size: 1.4em;
+ font-weight: 500;
+ margin-top: var(--spacing-lg);
+ margin-bottom: var(--spacing-md);
+ border-bottom: 1px solid var(--color-border);
+ padding-bottom: var(--spacing-xs);
+}
+
+.dr-h3, h3 {
+ color: var(--color-primary);
+ font-size: 1.2em;
+ font-weight: 500;
+ margin-top: var(--spacing-md);
+ margin-bottom: var(--spacing-sm);
+}
+
+p {
+ margin: var(--spacing-md) 0;
+ line-height: 1.6;
+ color: var(--color-text-secondary);
+}
+
+/* 5. Card Components */
+.dr-card {
+ background: var(--color-bg-white);
+ border-radius: var(--radius-md);
+ padding: var(--spacing-lg);
+ box-shadow: var(--shadow-md);
+ margin-bottom: var(--spacing-lg);
+ transition: var(--transition-base);
+}
+
+.dr-card:hover {
+ transform: translateY(-2px);
+ box-shadow: var(--shadow-hover);
+}
+
+.dr-card--bordered {
+ border: 1px solid var(--color-border-light);
+}
+
+.dr-card--no-hover:hover {
+ transform: none;
+ box-shadow: var(--shadow-md);
+}
+
+/* Header Cards */
+.dr-header-card {
+ background: white;
+ border-radius: var(--radius-md);
+ padding: var(--spacing-lg);
+ box-shadow: var(--shadow-sm);
+ margin-bottom: var(--spacing-lg);
+ border: 1px solid var(--color-border-light);
+}
+
+/* 6. Grid System */
+.dr-grid {
+ display: grid;
+ gap: var(--spacing-md);
+}
+
+.dr-grid--stats {
+ grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
+}
+
+.dr-grid--cards {
+ grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
+}
+
+.dr-grid--metrics {
+ grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+}
+
+/* 7. Badges & Tags */
+.dr-badge {
+ display: inline-block;
+ padding: 4px 12px;
+ border-radius: 12px;
+ font-size: 12px;
+ font-weight: 600;
+ text-transform: uppercase;
+ letter-spacing: 0.3px;
+ line-height: 1.5;
+}
+
+.dr-badge--success {
+ background-color: var(--color-success-light);
+ color: var(--color-success-dark);
+}
+
+.dr-badge--warning {
+ background-color: var(--color-warning-light);
+ color: var(--color-warning-dark);
+}
+
+.dr-badge--danger {
+ background-color: var(--color-danger-light);
+ color: var(--color-danger-dark);
+}
+
+.dr-badge--info {
+ background-color: var(--color-info-light);
+ color: var(--color-info-dark);
+}
+
+.dr-badge--primary {
+ background-color: var(--color-primary);
+ color: white;
+}
+
+/* Tag variations */
+.dr-tag {
+ display: inline-block;
+ background-color: #f0f0f0;
+ color: #555;
+ padding: 4px 10px;
+ border-radius: var(--radius-sm);
+ font-size: 12px;
+ font-weight: 500;
+ margin: 2px;
+}
+
+.dr-tag--primary {
+ background-color: #e1f5fe;
+ color: #0277bd;
+}
+
+/* 8. Stat Cards */
+.dr-stat-card {
+ background: var(--color-bg-white);
+ border-radius: var(--radius-md);
+ padding: var(--spacing-lg);
+ text-align: center;
+ transition: var(--transition-base);
+ border: 1px solid var(--color-border);
+ box-shadow: var(--shadow-sm);
+}
+
+.dr-stat-card:hover {
+ transform: translateY(-2px);
+ box-shadow: var(--shadow-hover);
+}
+
+.dr-stat-value {
+ font-size: 2rem;
+ font-weight: 600;
+ color: var(--color-primary);
+ margin-bottom: var(--spacing-xs);
+ display: block;
+}
+
+.dr-stat-label {
+ color: var(--color-text-secondary);
+ font-size: 13px;
+ text-transform: uppercase;
+ letter-spacing: 0.3px;
+ display: block;
+ font-weight: 500;
+}
+
+/* 9. Sections */
+.dr-section {
+ background: var(--color-bg-white);
+ border-radius: var(--radius-md);
+ padding: var(--spacing-lg);
+ margin-bottom: var(--spacing-lg);
+ box-shadow: var(--shadow-sm);
+ border: 1px solid var(--color-border-light);
+}
+
+.dr-section--bordered {
+ border-left: 3px solid var(--color-primary);
+}
+
+.dr-section--info {
+ background-color: #e8f4f8;
+ border-left: 4px solid var(--color-primary);
+}
+
+.dr-section--warning {
+ background-color: var(--color-warning-light);
+ border-left: 4px solid var(--color-warning);
+}
+
+.dr-section--success {
+ background-color: var(--color-success-light);
+ border-left: 4px solid var(--color-success);
+}
+
+.dr-section--danger {
+ background-color: var(--color-danger-light);
+ border-left: 4px solid var(--color-danger);
+}
+
+/* 10. Tables */
+.dr-table {
+ width: 100%;
+ border-collapse: collapse;
+ margin: var(--spacing-md) 0;
+ background: var(--color-bg-white);
+ overflow: hidden;
+}
+
+.dr-table th {
+ background-color: var(--color-primary);
+ color: white;
+ padding: var(--spacing-sm);
+ text-align: left;
+ font-weight: 600;
+}
+
+.dr-table td {
+ padding: var(--spacing-sm);
+ border-bottom: 1px solid var(--color-border);
+}
+
+.dr-table tr:last-child td {
+ border-bottom: none;
+}
+
+.dr-table tr:hover {
+ background-color: var(--color-bg-secondary);
+}
+
+.dr-table--striped tr:nth-child(even) {
+ background-color: #f2f2f2;
+}
+
+/* 11. Buttons */
+.dr-button {
+ background: var(--color-primary);
+ color: white;
+ border: none;
+ padding: 10px 20px;
+ border-radius: var(--radius-md);
+ font-size: 14px;
+ font-weight: 500;
+ cursor: pointer;
+ transition: var(--transition-base);
+ display: inline-flex;
+ align-items: center;
+ gap: var(--spacing-xs);
+ text-decoration: none;
+ position: relative;
+ overflow: hidden;
+}
+
+.dr-button:hover {
+ background: var(--color-primary-dark);
+ transform: translateY(-1px);
+ box-shadow: var(--shadow-hover);
+}
+
+.dr-button:active {
+ transform: translateY(0);
+ box-shadow: var(--shadow-sm);
+}
+
+.dr-button--secondary {
+ background: var(--color-secondary);
+}
+
+.dr-button--secondary:hover {
+ background: var(--color-secondary-dark);
+ box-shadow: var(--shadow-hover);
+}
+
+.dr-button--success {
+ background: var(--color-success);
+}
+
+.dr-button--small {
+ padding: 6px 12px;
+ font-size: 12px;
+}
+
+/* 12. Confidence Indicators */
+.dr-confidence {
+ display: inline-flex;
+ align-items: center;
+ padding: 6px 16px;
+ border-radius: 20px;
+ font-weight: 600;
+ font-size: 13px;
+ gap: var(--spacing-xs);
+ box-shadow: var(--shadow-sm);
+}
+
+.dr-confidence--high {
+ background: linear-gradient(to right, #d4edda, #c3e6cb);
+ color: var(--color-success-dark);
+}
+
+.dr-confidence--medium {
+ background: linear-gradient(to right, #fff3cd, #ffeeba);
+ color: var(--color-warning-dark);
+}
+
+.dr-confidence--low {
+ background: linear-gradient(to right, #f8d7da, #f5c6cb);
+ color: var(--color-danger-dark);
+}
+
+/* 13. Chart Containers */
+.dr-chart-container {
+ position: relative;
+ height: 300px;
+ margin: var(--spacing-md) 0;
+}
+
+/* 14. Code Blocks */
+.dr-code {
+ background-color: #f7f7f7;
+ border: 1px solid #e1e1e8;
+ border-radius: var(--radius-sm);
+ padding: var(--spacing-sm);
+ font-family: var(--font-family-mono);
+ overflow-x: auto;
+ white-space: pre-wrap;
+ word-wrap: break-word;
+}
+
+/* 15. Lists */
+.dr-list {
+ margin: var(--spacing-sm) 0;
+ padding-left: 25px;
+}
+
+.dr-list li {
+ margin: 8px 0;
+ line-height: 1.6;
+}
+
+.dr-list--unstyled {
+ list-style-type: none;
+ padding-left: 0;
+}
+
+/* 16. Notice Boxes */
+.dr-notice {
+ padding: 15px;
+ margin: 20px 0;
+ border-radius: var(--radius-sm);
+}
+
+.dr-notice--info {
+ background-color: #e8f4f8;
+ border-left: 4px solid var(--color-primary);
+ color: var(--color-info-dark);
+}
+
+.dr-notice--warning {
+ background-color: var(--color-warning-light);
+ border-left: 4px solid var(--color-warning);
+ color: var(--color-warning-dark);
+}
+
+/* 17. Loading States */
+.dr-loading {
+ text-align: center;
+ padding: var(--spacing-xxl);
+ color: var(--color-text-secondary);
+ font-style: italic;
+}
+
+/* 18. Empty States */
+.dr-empty {
+ text-align: center;
+ color: var(--color-text-muted);
+ font-style: italic;
+ padding: var(--spacing-xl);
+ background: var(--color-bg-white);
+ border-radius: var(--radius-lg);
+ box-shadow: var(--shadow-md);
+}
+
+/* 19. Utility Classes */
+.dr-text-center { text-align: center; }
+.dr-text-right { text-align: right; }
+.dr-text-left { text-align: left; }
+.dr-text-muted { color: var(--color-text-muted); }
+.dr-text-secondary { color: var(--color-text-secondary); }
+.dr-text-primary { color: var(--color-text-primary); }
+
+/* Margin utilities */
+.dr-mt-xs { margin-top: var(--spacing-xs); }
+.dr-mt-sm { margin-top: var(--spacing-sm); }
+.dr-mt-md { margin-top: var(--spacing-md); }
+.dr-mt-lg { margin-top: var(--spacing-lg); }
+.dr-mt-xl { margin-top: var(--spacing-xl); }
+
+.dr-mb-xs { margin-bottom: var(--spacing-xs); }
+.dr-mb-sm { margin-bottom: var(--spacing-sm); }
+.dr-mb-md { margin-bottom: var(--spacing-md); }
+.dr-mb-lg { margin-bottom: var(--spacing-lg); }
+.dr-mb-xl { margin-bottom: var(--spacing-xl); }
+
+/* Padding utilities */
+.dr-p-sm { padding: var(--spacing-sm); }
+.dr-p-md { padding: var(--spacing-md); }
+.dr-p-lg { padding: var(--spacing-lg); }
+
+/* Display utilities */
+.dr-d-none { display: none; }
+.dr-d-block { display: block; }
+.dr-d-flex { display: flex; }
+.dr-d-grid { display: grid; }
+
+/* Flex utilities */
+.dr-flex-center {
+ display: flex;
+ align-items: center;
+ justify-content: center;
+}
+
+.dr-flex-between {
+ display: flex;
+ align-items: center;
+ justify-content: space-between;
+}
+
+/* 20. Special Components */
+
+/* Mind Map Styles */
+.dr-mind-map {
+ position: relative;
+ margin: var(--spacing-xl) 0;
+}
+
+.dr-mind-map-node {
+ background: linear-gradient(135deg, var(--color-primary) 0%, var(--color-primary-dark) 100%);
+ color: white;
+ padding: var(--spacing-lg);
+ border-radius: var(--radius-md);
+ text-align: center;
+ font-size: 1.25rem;
+ font-weight: 600;
+ box-shadow: var(--shadow-md);
+ margin-bottom: var(--spacing-xl);
+}
+
+/* Result Cards */
+.dr-result-item {
+ background: var(--color-bg-secondary);
+ border-radius: var(--radius-md);
+ padding: 15px;
+ margin-bottom: 15px;
+ border: 1px solid var(--color-border);
+ transition: var(--transition-base);
+}
+
+.dr-result-item:hover {
+ box-shadow: var(--shadow-hover);
+ transform: translateY(-1px);
+}
+
+.dr-result-title {
+ font-weight: 600;
+ color: var(--color-heading);
+ margin-bottom: var(--spacing-xs);
+}
+
+.dr-result-snippet {
+ color: var(--color-text-secondary);
+ font-size: 13px;
+ line-height: 1.6;
+ margin-bottom: var(--spacing-sm);
+}
+
+.dr-result-link {
+ color: var(--color-primary);
+ text-decoration: none;
+ font-size: 13px;
+ font-weight: 500;
+}
+
+.dr-result-link:hover {
+ text-decoration: underline;
+}
+
+/* Timestamp */
+.dr-timestamp {
+ text-align: right;
+ color: var(--color-text-light);
+ font-size: 12px;
+ margin-top: var(--spacing-md);
+ padding-top: var(--spacing-md);
+ border-top: 1px dashed var(--color-border-dark);
+}
+
+/* 21. Gradients */
+.dr-gradient-primary {
+ background: linear-gradient(135deg, var(--color-primary) 0%, var(--color-primary-dark) 100%);
+}
+
+.dr-gradient-header {
+ background: linear-gradient(90deg, var(--color-primary), var(--color-success), var(--color-warning), var(--color-danger));
+ height: 5px;
+}
+
+/* 22. Responsive Design */
+@media (max-width: 768px) {
+ body {
+ padding: var(--spacing-sm);
+ }
+
+ .dr-container {
+ padding: var(--spacing-sm);
+ }
+
+ .dr-grid--stats,
+ .dr-grid--cards,
+ .dr-grid--metrics {
+ grid-template-columns: 1fr;
+ }
+
+ .dr-h1, h1 {
+ font-size: 1.5em;
+ }
+
+ .dr-h2, h2 {
+ font-size: 1.25em;
+ }
+
+ .dr-stat-value {
+ font-size: 1.75rem;
+ }
+
+ .dr-section,
+ .dr-card {
+ padding: var(--spacing-md);
+ }
+
+ .dr-table {
+ font-size: 13px;
+ }
+
+ .dr-table th,
+ .dr-table td {
+ padding: 8px;
+ }
+}
+
+/* 23. Print Styles */
+@media print {
+ body {
+ background: white;
+ color: black;
+ }
+
+ .dr-card,
+ .dr-section {
+ box-shadow: none;
+ border: 1px solid #ddd;
+ }
+
+ .dr-button {
+ display: none;
+ }
+}
\ No newline at end of file
diff --git a/deep_research/configs/balanced_research.yaml b/deep_research/configs/balanced_research.yaml
new file mode 100644
index 00000000..4f8bfa23
--- /dev/null
+++ b/deep_research/configs/balanced_research.yaml
@@ -0,0 +1,79 @@
+# Deep Research Pipeline Configuration - Balanced Mode
+enable_cache: true
+
+# ZenML MCP
+model:
+ name: "deep_research"
+ description: "Parallelized ZenML pipelines for deep research on a given query."
+ tags:
+ [
+ "research",
+ "exa",
+ "tavily",
+ "openrouter",
+ "sambanova",
+ "langfuse",
+ "balanced",
+ ]
+ use_cases: "Research on a given query."
+
+# Langfuse project name for LLM tracking
+langfuse_project_name: "deep-research"
+
+# Research parameters for balanced research
+parameters:
+ query: "Default research query"
+
+steps:
+ initial_query_decomposition_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ max_sub_questions: 10 # Balanced number of sub-questions
+
+ process_sub_question_step:
+ parameters:
+ llm_model_search: "sambanova/Meta-Llama-3.3-70B-Instruct"
+ llm_model_synthesis: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ cap_search_length: 20000 # Standard cap for search length
+
+ cross_viewpoint_analysis_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ viewpoint_categories:
+ [
+ "scientific",
+ "political",
+ "economic",
+ "social",
+ "ethical",
+ "historical",
+ ] # Standard viewpoints
+
+ generate_reflection_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ get_research_approval_step:
+ parameters:
+ timeout: 3600 # 1 hour timeout
+ max_queries: 2 # Moderate additional queries
+
+ execute_approved_searches_step:
+ parameters:
+ llm_model: "sambanova/Meta-Llama-3.3-70B-Instruct"
+ cap_search_length: 20000
+
+ pydantic_final_report_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+# Environment settings
+settings:
+ docker:
+ requirements:
+ - openai>=1.0.0
+ - tavily-python>=0.2.8
+ - PyYAML>=6.0
+ - click>=8.0.0
+ - pydantic>=2.0.0
+ - typing_extensions>=4.0.0
\ No newline at end of file
diff --git a/deep_research/configs/deep_research.yaml b/deep_research/configs/deep_research.yaml
new file mode 100644
index 00000000..61cc4c2b
--- /dev/null
+++ b/deep_research/configs/deep_research.yaml
@@ -0,0 +1,81 @@
+# Deep Research Pipeline Configuration - Deep Comprehensive Mode
+enable_cache: false # Disable cache for fresh comprehensive analysis
+
+# ZenML MCP
+model:
+ name: "deep_research"
+ description: "Parallelized ZenML pipelines for deep research on a given query."
+ tags:
+ [
+ "research",
+ "exa",
+ "tavily",
+ "openrouter",
+ "sambanova",
+ "langfuse",
+ "deep",
+ ]
+ use_cases: "Research on a given query."
+
+# Langfuse project name for LLM tracking
+langfuse_project_name: "deep-research"
+
+# Research parameters for deep comprehensive research
+parameters:
+ query: "Default research query"
+
+steps:
+ initial_query_decomposition_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ max_sub_questions: 15 # Maximum sub-questions for comprehensive analysis
+
+ process_sub_question_step:
+ parameters:
+ llm_model_search: "sambanova/Meta-Llama-3.3-70B-Instruct"
+ llm_model_synthesis: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ cap_search_length: 30000 # Higher cap for more comprehensive data
+
+ cross_viewpoint_analysis_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ viewpoint_categories:
+ [
+ "scientific",
+ "political",
+ "economic",
+ "social",
+ "ethical",
+ "historical",
+ "technological",
+ "philosophical",
+ ] # Extended viewpoints for comprehensive analysis
+
+ generate_reflection_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ get_research_approval_step:
+ parameters:
+ timeout: 7200 # 2 hour timeout for deep research
+ max_queries: 4 # Maximum additional queries for deep mode
+
+ execute_approved_searches_step:
+ parameters:
+ llm_model: "sambanova/Meta-Llama-3.3-70B-Instruct"
+ cap_search_length: 30000
+
+ pydantic_final_report_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+# Environment settings
+settings:
+ docker:
+ requirements:
+ - openai>=1.0.0
+ - tavily-python>=0.2.8
+ - PyYAML>=6.0
+ - click>=8.0.0
+ - pydantic>=2.0.0
+ - typing_extensions>=4.0.0
\ No newline at end of file
diff --git a/deep_research/configs/enhanced_research.yaml b/deep_research/configs/enhanced_research.yaml
new file mode 100644
index 00000000..0bfc0a79
--- /dev/null
+++ b/deep_research/configs/enhanced_research.yaml
@@ -0,0 +1,71 @@
+# Enhanced Deep Research Pipeline Configuration
+enable_cache: false
+
+# ZenML MCP
+model:
+ name: "deep_research"
+ description: "Parallelized ZenML pipelines for deep research on a given query."
+ tags:
+ [
+ "research",
+ "exa",
+ "tavily",
+ "openrouter",
+ "sambanova",
+ "langfuse",
+ "enhanced",
+ ]
+ use_cases: "Research on a given query."
+
+# Research query parameters
+query: "Climate change policy debates"
+
+# Langfuse project name for LLM tracking
+langfuse_project_name: "deep-research"
+
+# Step configurations
+steps:
+ initial_query_decomposition_step:
+ parameters:
+ llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
+
+ cross_viewpoint_analysis_step:
+ parameters:
+ llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
+ viewpoint_categories:
+ [
+ "scientific",
+ "political",
+ "economic",
+ "social",
+ "ethical",
+ "historical",
+ ]
+
+ generate_reflection_step:
+ parameters:
+ llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
+
+ get_research_approval_step:
+ parameters:
+ timeout: 3600
+ max_queries: 2
+
+ execute_approved_searches_step:
+ parameters:
+ llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
+
+ pydantic_final_report_step:
+ parameters:
+ llm_model: "openrouter/google/gemini-2.0-flash-lite-001"
+
+# Environment settings
+settings:
+ docker:
+ requirements:
+ - openai>=1.0.0
+ - tavily-python>=0.2.8
+ - PyYAML>=6.0
+ - click>=8.0.0
+ - pydantic>=2.0.0
+ - typing_extensions>=4.0.0
diff --git a/deep_research/configs/enhanced_research_with_approval.yaml b/deep_research/configs/enhanced_research_with_approval.yaml
new file mode 100644
index 00000000..73d6fe42
--- /dev/null
+++ b/deep_research/configs/enhanced_research_with_approval.yaml
@@ -0,0 +1,77 @@
+# Enhanced Deep Research Pipeline Configuration with Human Approval
+enable_cache: false
+
+# ZenML MCP
+model:
+ name: "deep_research"
+ description: "Parallelized ZenML pipelines for deep research on a given query."
+ tags:
+ [
+ "research",
+ "exa",
+ "tavily",
+ "openrouter",
+ "sambanova",
+ "langfuse",
+ "enhanced_approval",
+ ]
+ use_cases: "Research on a given query."
+
+# Langfuse project name for LLM tracking
+langfuse_project_name: "deep-research"
+
+# Research query parameters
+query: "Climate change policy debates"
+
+# Pipeline parameters
+parameters:
+ require_approval: true # Enable human-in-the-loop approval
+ approval_timeout: 1800 # 30 minutes timeout for approval
+ max_additional_searches: 3 # Allow up to 3 additional searches
+
+# Step configurations
+steps:
+ initial_query_decomposition_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ cross_viewpoint_analysis_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ viewpoint_categories:
+ [
+ "scientific",
+ "political",
+ "economic",
+ "social",
+ "ethical",
+ "historical",
+ ]
+
+ # New reflection steps (replacing iterative_reflection_step)
+ generate_reflection_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ get_research_approval_step:
+ parameters:
+ alerter_type: "slack" # or "email" if configured
+
+ execute_approved_searches_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ pydantic_final_report_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+# Environment settings
+settings:
+ docker:
+ requirements:
+ - openai>=1.0.0
+ - tavily-python>=0.2.8
+ - PyYAML>=6.0
+ - click>=8.0.0
+ - pydantic>=2.0.0
+ - typing_extensions>=4.0.0
\ No newline at end of file
diff --git a/deep_research/configs/quick_research.yaml b/deep_research/configs/quick_research.yaml
new file mode 100644
index 00000000..b210f18f
--- /dev/null
+++ b/deep_research/configs/quick_research.yaml
@@ -0,0 +1,59 @@
+# Deep Research Pipeline Configuration - Quick Research
+enable_cache: true
+
+# ZenML MCP
+model:
+ name: "deep_research"
+ description: "Parallelized ZenML pipelines for deep research on a given query."
+ tags:
+ [
+ "research",
+ "exa",
+ "tavily",
+ "openrouter",
+ "sambanova",
+ "langfuse",
+ "quick",
+ ]
+ use_cases: "Research on a given query."
+
+# Langfuse project name for LLM tracking
+langfuse_project_name: "deep-research"
+
+# Research parameters for quick research
+parameters:
+ query: "Default research query"
+
+steps:
+ initial_query_decomposition_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ max_sub_questions: 5 # Limit to fewer sub-questions for quick research
+
+ process_sub_question_step:
+ parameters:
+ llm_model_search: "sambanova/Meta-Llama-3.3-70B-Instruct"
+ llm_model_synthesis: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ generate_reflection_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ get_research_approval_step:
+ parameters:
+ auto_approve: true # Auto-approve for quick research
+
+ execute_approved_searches_step:
+ parameters:
+ llm_model: "sambanova/Meta-Llama-3.3-70B-Instruct"
+
+# Environment settings
+settings:
+ docker:
+ requirements:
+ - openai>=1.0.0
+ - tavily-python>=0.2.8
+ - PyYAML>=6.0
+ - click>=8.0.0
+ - pydantic>=2.0.0
+ - typing_extensions>=4.0.0
diff --git a/deep_research/configs/rapid_research.yaml b/deep_research/configs/rapid_research.yaml
new file mode 100644
index 00000000..e69982bf
--- /dev/null
+++ b/deep_research/configs/rapid_research.yaml
@@ -0,0 +1,59 @@
+# Deep Research Pipeline Configuration - Quick Research
+enable_cache: true
+
+# ZenML MCP
+model:
+ name: "deep_research"
+ description: "Parallelized ZenML pipelines for deep research on a given query."
+ tags:
+ [
+ "research",
+ "exa",
+ "tavily",
+ "openrouter",
+ "sambanova",
+ "langfuse",
+ "rapid",
+ ]
+ use_cases: "Research on a given query."
+
+# Langfuse project name for LLM tracking
+langfuse_project_name: "deep-research"
+
+# Research parameters for quick research
+parameters:
+ query: "Default research query"
+
+steps:
+ initial_query_decomposition_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+ max_sub_questions: 5 # Limit to fewer sub-questions for quick research
+
+ process_sub_question_step:
+ parameters:
+ llm_model_search: "sambanova/Meta-Llama-3.3-70B-Instruct"
+ llm_model_synthesis: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ generate_reflection_step:
+ parameters:
+ llm_model: "sambanova/DeepSeek-R1-Distill-Llama-70B"
+
+ get_research_approval_step:
+ parameters:
+ auto_approve: true # Auto-approve for quick research
+
+ execute_approved_searches_step:
+ parameters:
+ llm_model: "sambanova/Meta-Llama-3.3-70B-Instruct"
+
+# Environment settings
+settings:
+ docker:
+ requirements:
+ - openai>=1.0.0
+ - tavily-python>=0.2.8
+ - PyYAML>=6.0
+ - click>=8.0.0
+ - pydantic>=2.0.0
+ - typing_extensions>=4.0.0
diff --git a/deep_research/logging_config.py b/deep_research/logging_config.py
new file mode 100644
index 00000000..2b93c3e0
--- /dev/null
+++ b/deep_research/logging_config.py
@@ -0,0 +1,42 @@
+import logging
+import sys
+from typing import Optional
+
+
+def configure_logging(
+ level: int = logging.INFO, log_file: Optional[str] = None
+):
+ """Configure logging for the application.
+
+ Args:
+ level: The log level (default: INFO)
+ log_file: Optional path to a log file
+ """
+ # Create formatter
+ formatter = logging.Formatter(
+ "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+ )
+
+ # Configure root logger
+ root_logger = logging.getLogger()
+ root_logger.setLevel(level)
+
+ # Remove existing handlers to avoid duplicate logs
+ for handler in root_logger.handlers[:]:
+ root_logger.removeHandler(handler)
+
+ # Console handler
+ console_handler = logging.StreamHandler(sys.stdout)
+ console_handler.setFormatter(formatter)
+ root_logger.addHandler(console_handler)
+
+ # File handler if log_file is provided
+ if log_file:
+ file_handler = logging.FileHandler(log_file)
+ file_handler.setFormatter(formatter)
+ root_logger.addHandler(file_handler)
+
+ # Reduce verbosity for noisy third-party libraries
+ logging.getLogger("LiteLLM").setLevel(logging.WARNING)
+ logging.getLogger("httpx").setLevel(logging.WARNING)
+ logging.getLogger("urllib3").setLevel(logging.WARNING)
diff --git a/deep_research/materializers/__init__.py b/deep_research/materializers/__init__.py
new file mode 100644
index 00000000..7009d557
--- /dev/null
+++ b/deep_research/materializers/__init__.py
@@ -0,0 +1,26 @@
+"""
+Materializers package for the ZenML Deep Research project.
+
+This package contains custom ZenML materializers that handle serialization and
+deserialization of complex data types used in the research pipeline.
+"""
+
+from .analysis_data_materializer import AnalysisDataMaterializer
+from .approval_decision_materializer import ApprovalDecisionMaterializer
+from .final_report_materializer import FinalReportMaterializer
+from .prompt_materializer import PromptMaterializer
+from .query_context_materializer import QueryContextMaterializer
+from .search_data_materializer import SearchDataMaterializer
+from .synthesis_data_materializer import SynthesisDataMaterializer
+from .tracing_metadata_materializer import TracingMetadataMaterializer
+
+__all__ = [
+ "ApprovalDecisionMaterializer",
+ "PromptMaterializer",
+ "TracingMetadataMaterializer",
+ "QueryContextMaterializer",
+ "SearchDataMaterializer",
+ "SynthesisDataMaterializer",
+ "AnalysisDataMaterializer",
+ "FinalReportMaterializer",
+]
diff --git a/deep_research/materializers/analysis_data_materializer.py b/deep_research/materializers/analysis_data_materializer.py
new file mode 100644
index 00000000..8b78aa91
--- /dev/null
+++ b/deep_research/materializers/analysis_data_materializer.py
@@ -0,0 +1,269 @@
+"""Materializer for AnalysisData with viewpoint tension diagrams and reflection insights."""
+
+import os
+from typing import Dict
+
+from utils.css_utils import (
+ get_card_class,
+ get_section_class,
+ get_shared_css_tag,
+)
+from utils.pydantic_models import AnalysisData
+from zenml.enums import ArtifactType, VisualizationType
+from zenml.io import fileio
+from zenml.materializers import PydanticMaterializer
+
+
+class AnalysisDataMaterializer(PydanticMaterializer):
+ """Materializer for AnalysisData with viewpoint and reflection visualization."""
+
+ ASSOCIATED_TYPES = (AnalysisData,)
+ ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA
+
+ def save_visualizations(
+ self, data: AnalysisData
+ ) -> Dict[str, VisualizationType]:
+ """Create and save visualizations for the AnalysisData.
+
+ Args:
+ data: The AnalysisData to visualize
+
+ Returns:
+ Dictionary mapping file paths to visualization types
+ """
+ visualization_path = os.path.join(self.uri, "analysis_data.html")
+ html_content = self._generate_visualization_html(data)
+
+ with fileio.open(visualization_path, "w") as f:
+ f.write(html_content)
+
+ return {visualization_path: VisualizationType.HTML}
+
+ def _generate_visualization_html(self, data: AnalysisData) -> str:
+ """Generate HTML visualization for the analysis data.
+
+ Args:
+ data: The AnalysisData to visualize
+
+ Returns:
+ HTML string
+ """
+ # Viewpoint analysis section
+ viewpoint_html = ""
+ if data.viewpoint_analysis:
+ va = data.viewpoint_analysis
+
+ # Points of agreement
+ agreement_html = ""
+ if va.main_points_of_agreement:
+ agreement_html = "
Main Points of Agreement
"
+ for point in va.main_points_of_agreement:
+ agreement_html += f"
{point}
"
+ agreement_html += "
"
+
+ # Areas of tension
+ tensions_html = ""
+ if va.areas_of_tension:
+ tensions_html = (
+ "
Areas of Tension
"
+ )
+ for tension in va.areas_of_tension:
+ viewpoints_html = ""
+ for perspective, view in tension.viewpoints.items():
+ viewpoints_html += f"""
+
")
+
+ # Handle case where code block wasn't closed
+ if in_code_block and code_lines:
+ code_content = "\n".join(code_lines)
+ formatted_lines.append(
+ f'
{html.escape(code_content)}
'
+ )
+
+ # Wrap list items in ul tags
+ result = []
+ in_list = False
+ for line in formatted_lines:
+ if line.startswith("
")
+ in_list = False
+ result.append(line)
+
+ if in_list:
+ result.append("")
+
+ return "\n".join(result)
+
+
+def generate_executive_summary(
+ query_context: QueryContext,
+ synthesis_data: SynthesisData,
+ analysis_data: AnalysisData,
+ executive_summary_prompt: Prompt,
+ llm_model: str = "sambanova/DeepSeek-R1-Distill-Llama-70B",
+ langfuse_project_name: str = "deep-research",
+) -> str:
+ """Generate an executive summary using LLM based on the complete research findings.
+
+ Args:
+ query_context: The query context with main query and sub-questions
+ synthesis_data: The synthesis data with all synthesized information
+ analysis_data: The analysis data with viewpoint analysis
+ executive_summary_prompt: Prompt for generating executive summary
+ llm_model: The model to use for generation
+ langfuse_project_name: Name of the Langfuse project for tracking
+
+ Returns:
+ HTML formatted executive summary
+ """
+ logger.info("Generating executive summary using LLM")
+
+ # Prepare the context
+ summary_input = {
+ "main_query": query_context.main_query,
+ "sub_questions": query_context.sub_questions,
+ "key_findings": {},
+ "viewpoint_analysis": None,
+ }
+
+ # Include key findings from synthesis data
+ # Prefer enhanced info if available
+ info_source = (
+ synthesis_data.enhanced_info
+ if synthesis_data.enhanced_info
+ else synthesis_data.synthesized_info
+ )
+
+ for question in query_context.sub_questions:
+ if question in info_source:
+ info = info_source[question]
+ summary_input["key_findings"][question] = {
+ "answer": info.synthesized_answer,
+ "confidence": info.confidence_level,
+ "gaps": info.information_gaps,
+ }
+
+ # Include viewpoint analysis if available
+ if analysis_data.viewpoint_analysis:
+ va = analysis_data.viewpoint_analysis
+ summary_input["viewpoint_analysis"] = {
+ "agreements": va.main_points_of_agreement,
+ "tensions": len(va.areas_of_tension),
+ "insights": va.integrative_insights,
+ }
+
+ try:
+ # Call LLM to generate executive summary
+ result = run_llm_completion(
+ prompt=json.dumps(summary_input),
+ system_prompt=str(executive_summary_prompt),
+ model=llm_model,
+ temperature=0.7,
+ max_tokens=600,
+ project=langfuse_project_name,
+ tags=["executive_summary_generation"],
+ )
+
+ if result:
+ content = remove_reasoning_from_output(result)
+ # Clean up the HTML
+ content = extract_html_from_content(content)
+ logger.info("Successfully generated LLM-based executive summary")
+ return content
+ else:
+ logger.warning("Failed to generate executive summary via LLM")
+ return generate_fallback_executive_summary(
+ query_context, synthesis_data
+ )
+
+ except Exception as e:
+ logger.error(f"Error generating executive summary: {e}")
+ return generate_fallback_executive_summary(
+ query_context, synthesis_data
+ )
+
+
+def generate_introduction(
+ query_context: QueryContext,
+ introduction_prompt: Prompt,
+ llm_model: str = "sambanova/DeepSeek-R1-Distill-Llama-70B",
+ langfuse_project_name: str = "deep-research",
+) -> str:
+ """Generate an introduction using LLM based on research query and sub-questions.
+
+ Args:
+ query_context: The query context with main query and sub-questions
+ introduction_prompt: Prompt for generating introduction
+ llm_model: The model to use for generation
+ langfuse_project_name: Name of the Langfuse project for tracking
+
+ Returns:
+ HTML formatted introduction
+ """
+ logger.info("Generating introduction using LLM")
+
+ # Prepare the context
+ context = f"Main Research Query: {query_context.main_query}\n\n"
+ context += "Sub-questions being explored:\n"
+ for i, sub_question in enumerate(query_context.sub_questions, 1):
+ context += f"{i}. {sub_question}\n"
+
+ try:
+ # Call LLM to generate introduction
+ result = run_llm_completion(
+ prompt=context,
+ system_prompt=str(introduction_prompt),
+ model=llm_model,
+ temperature=0.7,
+ max_tokens=600,
+ project=langfuse_project_name,
+ tags=["introduction_generation"],
+ )
+
+ if result:
+ content = remove_reasoning_from_output(result)
+ # Clean up the HTML
+ content = extract_html_from_content(content)
+ logger.info("Successfully generated LLM-based introduction")
+ return content
+ else:
+ logger.warning("Failed to generate introduction via LLM")
+ return generate_fallback_introduction(query_context)
+
+ except Exception as e:
+ logger.error(f"Error generating introduction: {e}")
+ return generate_fallback_introduction(query_context)
+
+
+def generate_fallback_executive_summary(
+ query_context: QueryContext, synthesis_data: SynthesisData
+) -> str:
+ """Generate a fallback executive summary when LLM fails."""
+ summary = f"
This report examines the question: {html.escape(query_context.main_query)}
"
+ summary += f"
The research explored {len(query_context.sub_questions)} key dimensions of this topic, "
+ summary += "synthesizing findings from multiple sources to provide a comprehensive analysis.
"
+
+ # Add confidence overview
+ confidence_counts = {"high": 0, "medium": 0, "low": 0}
+ info_source = (
+ synthesis_data.enhanced_info
+ if synthesis_data.enhanced_info
+ else synthesis_data.synthesized_info
+ )
+ for info in info_source.values():
+ level = info.confidence_level.lower()
+ if level in confidence_counts:
+ confidence_counts[level] += 1
+
+ summary += f"
This report addresses the research query: {html.escape(query_context.main_query)}
"
+ intro += f"
The research was conducted by breaking down the main query into {len(query_context.sub_questions)} "
+ intro += (
+ "sub-questions to explore different aspects of the topic in depth. "
+ )
+ intro += "Each sub-question was researched independently, with findings synthesized from various sources.
"
+ return intro
+
+
+def generate_conclusion(
+ query_context: QueryContext,
+ synthesis_data: SynthesisData,
+ analysis_data: AnalysisData,
+ conclusion_generation_prompt: Prompt,
+ llm_model: str = "sambanova/DeepSeek-R1-Distill-Llama-70B",
+ langfuse_project_name: str = "deep-research",
+) -> str:
+ """Generate a comprehensive conclusion using LLM based on all research findings.
+
+ Args:
+ query_context: The query context with main query and sub-questions
+ synthesis_data: The synthesis data with all synthesized information
+ analysis_data: The analysis data with viewpoint analysis
+ conclusion_generation_prompt: Prompt for generating conclusion
+ llm_model: The model to use for conclusion generation
+ langfuse_project_name: Name of the Langfuse project for tracking
+
+ Returns:
+ str: HTML-formatted conclusion content
+ """
+ logger.info("Generating comprehensive conclusion using LLM")
+
+ # Prepare input data for conclusion generation
+ conclusion_input = {
+ "main_query": query_context.main_query,
+ "sub_questions": query_context.sub_questions,
+ "enhanced_info": {},
+ }
+
+ # Include enhanced information for each sub-question
+ info_source = (
+ synthesis_data.enhanced_info
+ if synthesis_data.enhanced_info
+ else synthesis_data.synthesized_info
+ )
+
+ for question in query_context.sub_questions:
+ if question in info_source:
+ info = info_source[question]
+ conclusion_input["enhanced_info"][question] = {
+ "synthesized_answer": info.synthesized_answer,
+ "confidence_level": info.confidence_level,
+ "information_gaps": info.information_gaps,
+ "key_sources": info.key_sources,
+ "improvements": getattr(info, "improvements", []),
+ }
+
+ # Include viewpoint analysis
+ if analysis_data.viewpoint_analysis:
+ va = analysis_data.viewpoint_analysis
+ conclusion_input["viewpoint_analysis"] = {
+ "main_points_of_agreement": va.main_points_of_agreement,
+ "areas_of_tension": [
+ {"topic": t.topic, "viewpoints": t.viewpoints}
+ for t in va.areas_of_tension
+ ],
+ "integrative_insights": va.integrative_insights,
+ }
+
+ # Include reflection metadata if available
+ if analysis_data.reflection_metadata:
+ rm = analysis_data.reflection_metadata
+ conclusion_input["reflection_insights"] = {
+ "improvements_made": rm.improvements_made,
+ "additional_questions_identified": rm.additional_questions_identified,
+ }
+
+ try:
+ # Call LLM to generate conclusion
+ result = run_llm_completion(
+ prompt=json.dumps(conclusion_input),
+ system_prompt=str(conclusion_generation_prompt),
+ model=llm_model,
+ temperature=0.7,
+ max_tokens=800,
+ project=langfuse_project_name,
+ tags=["conclusion_generation"],
+ )
+
+ if result:
+ content = remove_reasoning_from_output(result)
+ # Clean up the HTML
+ content = extract_html_from_content(content)
+ logger.info("Successfully generated LLM-based conclusion")
+ return content
+ else:
+ logger.warning("Failed to generate conclusion via LLM")
+ return generate_fallback_conclusion(query_context, synthesis_data)
+
+ except Exception as e:
+ logger.error(f"Error generating conclusion: {e}")
+ return generate_fallback_conclusion(query_context, synthesis_data)
+
+
+def generate_fallback_conclusion(
+ query_context: QueryContext, synthesis_data: SynthesisData
+) -> str:
+ """Generate a fallback conclusion when LLM fails.
+
+ Args:
+ query_context: The query context with main query and sub-questions
+ synthesis_data: The synthesis data with all synthesized information
+
+ Returns:
+ str: Basic HTML-formatted conclusion
+ """
+ conclusion = f"
This research has explored the question: {html.escape(query_context.main_query)}
"
+ conclusion += f"
Through systematic investigation of {len(query_context.sub_questions)} sub-questions, "
+ conclusion += (
+ "we have gathered insights from multiple sources and perspectives.
"
+ )
+
+ # Add a summary of confidence levels
+ info_source = (
+ synthesis_data.enhanced_info
+ if synthesis_data.enhanced_info
+ else synthesis_data.synthesized_info
+ )
+ high_confidence = sum(
+ 1
+ for info in info_source.values()
+ if info.confidence_level.lower() == "high"
+ )
+
+ if high_confidence > 0:
+ conclusion += f"
The research yielded {high_confidence} high-confidence findings out of "
+ conclusion += f"{len(info_source)} total areas investigated.
"
+
+ conclusion += "
Further research may be beneficial to address remaining information gaps "
+ conclusion += "and explore emerging questions identified during this investigation.
"
+
+ return conclusion
+
+
+def generate_report_from_template(
+ query_context: QueryContext,
+ search_data: SearchData,
+ synthesis_data: SynthesisData,
+ analysis_data: AnalysisData,
+ conclusion_generation_prompt: Prompt,
+ executive_summary_prompt: Prompt,
+ introduction_prompt: Prompt,
+ llm_model: str = "sambanova/DeepSeek-R1-Distill-Llama-70B",
+ langfuse_project_name: str = "deep-research",
+) -> str:
+ """Generate a final HTML report from a static template.
+
+ Instead of using an LLM to generate HTML, this function uses predefined HTML
+ templates and populates them with data from the research artifacts.
+
+ Args:
+ query_context: The query context with main query and sub-questions
+ search_data: The search data (for source information)
+ synthesis_data: The synthesis data with all synthesized information
+ analysis_data: The analysis data with viewpoint analysis
+ conclusion_generation_prompt: Prompt for generating conclusion
+ executive_summary_prompt: Prompt for generating executive summary
+ introduction_prompt: Prompt for generating introduction
+ llm_model: The model to use for conclusion generation
+ langfuse_project_name: Name of the Langfuse project for tracking
+
+ Returns:
+ str: The HTML content of the report
+ """
+ logger.info(
+ f"Generating templated HTML report for query: {query_context.main_query}"
+ )
+
+ # Generate table of contents for sub-questions
+ sub_questions_toc = ""
+ for i, question in enumerate(query_context.sub_questions, 1):
+ safe_id = f"question-{i}"
+ sub_questions_toc += (
+ f'
\n'
+ )
+
+ # Generate HTML for sub-questions
+ sub_questions_html = ""
+ all_sources = set()
+
+ # Determine which info source to use (merge original with enhanced)
+ # Start with the original synthesized info
+ info_source = synthesis_data.synthesized_info.copy()
+
+ # Override with enhanced info where available
+ if synthesis_data.enhanced_info:
+ info_source.update(synthesis_data.enhanced_info)
+
+ # Debug logging
+ logger.info(
+ f"Synthesis data has enhanced_info: {bool(synthesis_data.enhanced_info)}"
+ )
+ logger.info(
+ f"Synthesis data has synthesized_info: {bool(synthesis_data.synthesized_info)}"
+ )
+ logger.info(f"Info source has {len(info_source)} entries")
+ logger.info(f"Processing {len(query_context.sub_questions)} sub-questions")
+
+ # Log the keys in info_source for debugging
+ if info_source:
+ logger.info(
+ f"Keys in info_source: {list(info_source.keys())[:3]}..."
+ ) # First 3 keys
+ logger.info(
+ f"Sub-questions from query_context: {query_context.sub_questions[:3]}..."
+ ) # First 3
+
+ for i, question in enumerate(query_context.sub_questions, 1):
+ info = info_source.get(question, None)
+
+ # Skip if no information is available
+ if not info:
+ logger.warning(
+ f"No synthesis info found for question {i}: {question}"
+ )
+ continue
+
+ # Process confidence level
+ confidence = info.confidence_level.lower()
+ confidence_upper = info.confidence_level.upper()
+
+ # Process key sources
+ key_sources_html = ""
+ if info.key_sources:
+ all_sources.update(info.key_sources)
+ sources_list = "\n".join(
+ [
+ f'
No external sources were referenced in this research.
"
+ )
+ references_html += "
"
+
+ # Generate dynamic executive summary using LLM
+ logger.info("Generating dynamic executive summary...")
+ executive_summary = generate_executive_summary(
+ query_context,
+ synthesis_data,
+ analysis_data,
+ executive_summary_prompt,
+ llm_model,
+ langfuse_project_name,
+ )
+ logger.info(
+ f"Executive summary generated: {len(executive_summary)} characters"
+ )
+
+ # Generate dynamic introduction using LLM
+ logger.info("Generating dynamic introduction...")
+ introduction_html = generate_introduction(
+ query_context, introduction_prompt, llm_model, langfuse_project_name
+ )
+ logger.info(f"Introduction generated: {len(introduction_html)} characters")
+
+ # Generate comprehensive conclusion using LLM
+ conclusion_html = generate_conclusion(
+ query_context,
+ synthesis_data,
+ analysis_data,
+ conclusion_generation_prompt,
+ llm_model,
+ langfuse_project_name,
+ )
+
+ # Generate complete HTML report
+ html_content = STATIC_HTML_TEMPLATE.format(
+ main_query=html.escape(query_context.main_query),
+ shared_css=get_shared_css_tag(),
+ sub_questions_toc=sub_questions_toc,
+ additional_sections_toc=additional_sections_toc,
+ executive_summary=executive_summary,
+ introduction_html=introduction_html,
+ num_sub_questions=len(query_context.sub_questions),
+ sub_questions_html=sub_questions_html,
+ viewpoint_analysis_html=viewpoint_analysis_html,
+ conclusion_html=conclusion_html,
+ references_html=references_html,
+ )
+
+ return html_content
+
+
+def _generate_fallback_report(
+ query_context: QueryContext,
+ synthesis_data: SynthesisData,
+ analysis_data: AnalysisData,
+) -> str:
+ """Generate a minimal fallback report when the main report generation fails.
+
+ This function creates a simplified HTML report with a consistent structure when
+ the main report generation process encounters an error.
+
+ Args:
+ query_context: The query context with main query and sub-questions
+ synthesis_data: The synthesis data with all synthesized information
+ analysis_data: The analysis data with viewpoint analysis
+
+ Returns:
+ str: A basic HTML report with a standard research report structure
+ """
+ # Create a simple HTML structure with embedded CSS for styling
+ html = f"""
+
+
+
+
+
+ Research Report - {html.escape(query_context.main_query)}
+
+
+
+
Research Report: {html.escape(query_context.main_query)}
+
+
+ Note: This is a simplified version of the report generated due to processing limitations.
+
+
+
+
Introduction
+
This report investigates the research query: {html.escape(query_context.main_query)}
+
The investigation was structured around {len(query_context.sub_questions)} key sub-questions to provide comprehensive coverage of the topic.
+
+
+
+
Research Findings
+"""
+
+ # Add findings for each sub-question
+ info_source = (
+ synthesis_data.enhanced_info
+ if synthesis_data.enhanced_info
+ else synthesis_data.synthesized_info
+ )
+
+ for i, question in enumerate(query_context.sub_questions, 1):
+ if question in info_source:
+ info = info_source[question]
+ confidence_class = info.confidence_level.lower()
+
+ html += f"""
+
+
{i}. {html.escape(question)}
+ Confidence: {info.confidence_level.upper()}
+
{html.escape(info.synthesized_answer)}
+ """
+
+ if info.information_gaps:
+ html += f"
Information Gaps: {html.escape(info.information_gaps)}
"
+
+ html += "
"
+
+ html += """
+
+
+
+
Conclusion
+
This research has provided insights into the various aspects of the main query through systematic investigation.
+
The findings represent a synthesis of available information, with varying levels of confidence across different areas.
+
+
+
+
References
+
Sources were gathered from various search providers and synthesized to create this report.
+
+
+
+"""
+
+ return html
+
+
+@step(
+ output_materializers={
+ "final_report": FinalReportMaterializer,
+ }
+)
+def pydantic_final_report_step(
+ query_context: QueryContext,
+ search_data: SearchData,
+ synthesis_data: SynthesisData,
+ analysis_data: AnalysisData,
+ conclusion_generation_prompt: Prompt,
+ executive_summary_prompt: Prompt,
+ introduction_prompt: Prompt,
+ use_static_template: bool = True,
+ llm_model: str = "sambanova/DeepSeek-R1-Distill-Llama-70B",
+ langfuse_project_name: str = "deep-research",
+) -> Tuple[
+ Annotated[FinalReport, "final_report"],
+ Annotated[HTMLString, "report_html"],
+]:
+ """Generate the final research report in HTML format using artifact-based approach.
+
+ This step uses the individual artifacts to generate a final HTML report.
+
+ Args:
+ query_context: The query context with main query and sub-questions
+ search_data: The search data (for source information)
+ synthesis_data: The synthesis data with all synthesized information
+ analysis_data: The analysis data with viewpoint analysis and reflection metadata
+ conclusion_generation_prompt: Prompt for generating conclusions
+ executive_summary_prompt: Prompt for generating executive summary
+ introduction_prompt: Prompt for generating introduction
+ use_static_template: Whether to use a static template instead of LLM generation
+ llm_model: The model to use for report generation with provider prefix
+ langfuse_project_name: Name of the Langfuse project for tracking
+
+ Returns:
+ A tuple containing the FinalReport artifact and the HTML report string
+ """
+ start_time = time.time()
+ logger.info(
+ "Generating final research report using artifact-based approach"
+ )
+
+ if use_static_template:
+ # Use the static HTML template approach
+ logger.info("Using static HTML template for report generation")
+ html_content = generate_report_from_template(
+ query_context,
+ search_data,
+ synthesis_data,
+ analysis_data,
+ conclusion_generation_prompt,
+ executive_summary_prompt,
+ introduction_prompt,
+ llm_model,
+ langfuse_project_name,
+ )
+
+ # Create the FinalReport artifact
+ final_report = FinalReport(
+ report_html=html_content,
+ main_query=query_context.main_query,
+ )
+
+ # Calculate execution time
+ execution_time = time.time() - start_time
+
+ # Calculate report metrics
+ info_source = (
+ synthesis_data.enhanced_info
+ if synthesis_data.enhanced_info
+ else synthesis_data.synthesized_info
+ )
+ confidence_distribution = {"high": 0, "medium": 0, "low": 0}
+ for info in info_source.values():
+ level = info.confidence_level.lower()
+ if level in confidence_distribution:
+ confidence_distribution[level] += 1
+
+ # Count various elements in the report
+ num_sources = len(
+ set(
+ source
+ for info in info_source.values()
+ for source in info.key_sources
+ )
+ )
+ has_viewpoint_analysis = analysis_data.viewpoint_analysis is not None
+ has_reflection_insights = (
+ analysis_data.reflection_metadata is not None
+ and analysis_data.reflection_metadata.improvements_made > 0
+ )
+
+ # Log step metadata
+ log_metadata(
+ metadata={
+ "final_report_generation": {
+ "execution_time_seconds": execution_time,
+ "use_static_template": use_static_template,
+ "llm_model": llm_model,
+ "main_query_length": len(query_context.main_query),
+ "num_sub_questions": len(query_context.sub_questions),
+ "num_synthesized_answers": len(info_source),
+ "has_enhanced_info": bool(synthesis_data.enhanced_info),
+ "confidence_distribution": confidence_distribution,
+ "num_unique_sources": num_sources,
+ "has_viewpoint_analysis": has_viewpoint_analysis,
+ "has_reflection_insights": has_reflection_insights,
+ "report_length_chars": len(html_content),
+ "report_generation_success": True,
+ }
+ }
+ )
+
+ # Log artifact metadata
+ log_metadata(
+ metadata={
+ "final_report_characteristics": {
+ "report_length": len(html_content),
+ "main_query": query_context.main_query,
+ "num_sections": len(query_context.sub_questions)
+ + (1 if has_viewpoint_analysis else 0),
+ "has_executive_summary": True,
+ "has_introduction": True,
+ "has_conclusion": True,
+ }
+ },
+ artifact_name="final_report",
+ infer_artifact=True,
+ )
+
+ # Add tags to the artifact
+ # add_tags(tags=["report", "final", "html"], artifact_name="final_report", infer_artifact=True)
+
+ logger.info(
+ f"Successfully generated final report ({len(html_content)} characters)"
+ )
+ return final_report, HTMLString(html_content)
+
+ else:
+ # Handle non-static template case (future implementation)
+ logger.warning(
+ "Non-static template generation not yet implemented, falling back to static template"
+ )
+ return pydantic_final_report_step(
+ query_context=query_context,
+ search_data=search_data,
+ synthesis_data=synthesis_data,
+ analysis_data=analysis_data,
+ conclusion_generation_prompt=conclusion_generation_prompt,
+ executive_summary_prompt=executive_summary_prompt,
+ introduction_prompt=introduction_prompt,
+ use_static_template=True,
+ llm_model=llm_model,
+ langfuse_project_name=langfuse_project_name,
+ )
diff --git a/deep_research/steps/query_decomposition_step.py b/deep_research/steps/query_decomposition_step.py
new file mode 100644
index 00000000..053a8fd4
--- /dev/null
+++ b/deep_research/steps/query_decomposition_step.py
@@ -0,0 +1,186 @@
+import logging
+import time
+from typing import Annotated
+
+from materializers.query_context_materializer import QueryContextMaterializer
+from utils.llm_utils import get_structured_llm_output
+from utils.pydantic_models import Prompt, QueryContext
+from zenml import log_metadata, step
+
+logger = logging.getLogger(__name__)
+
+
+@step(output_materializers=QueryContextMaterializer)
+def initial_query_decomposition_step(
+ main_query: str,
+ query_decomposition_prompt: Prompt,
+ llm_model: str = "sambanova/DeepSeek-R1-Distill-Llama-70B",
+ max_sub_questions: int = 8,
+ langfuse_project_name: str = "deep-research",
+) -> Annotated[QueryContext, "query_context"]:
+ """Break down a complex research query into specific sub-questions.
+
+ Args:
+ main_query: The main research query to decompose
+ query_decomposition_prompt: Prompt for query decomposition
+ llm_model: The reasoning model to use with provider prefix
+ max_sub_questions: Maximum number of sub-questions to generate
+ langfuse_project_name: Project name for tracing
+
+ Returns:
+ QueryContext containing the main query and decomposed sub-questions
+ """
+ start_time = time.time()
+ logger.info(f"Decomposing research query: {main_query}")
+
+ # Get the prompt content
+ system_prompt = str(query_decomposition_prompt)
+
+ try:
+ # Call OpenAI API to decompose the query
+ updated_system_prompt = (
+ system_prompt
+ + f"\nPlease generate at most {max_sub_questions} sub-questions."
+ )
+ logger.info(
+ f"Calling {llm_model} to decompose query into max {max_sub_questions} sub-questions"
+ )
+
+ # Define fallback questions
+ fallback_questions = [
+ {
+ "sub_question": f"What is {main_query}?",
+ "reasoning": "Basic understanding of the topic",
+ },
+ {
+ "sub_question": f"What are the key aspects of {main_query}?",
+ "reasoning": "Exploring important dimensions",
+ },
+ {
+ "sub_question": f"What are the implications of {main_query}?",
+ "reasoning": "Understanding broader impact",
+ },
+ ]
+
+ # Use utility function to get structured output
+ decomposed_questions = get_structured_llm_output(
+ prompt=main_query,
+ system_prompt=updated_system_prompt,
+ model=llm_model,
+ fallback_response=fallback_questions,
+ project=langfuse_project_name,
+ )
+
+ # Extract just the sub-questions
+ sub_questions = [
+ item.get("sub_question")
+ for item in decomposed_questions
+ if "sub_question" in item
+ ]
+
+ # Limit to max_sub_questions
+ sub_questions = sub_questions[:max_sub_questions]
+
+ logger.info(f"Generated {len(sub_questions)} sub-questions")
+ for i, question in enumerate(sub_questions, 1):
+ logger.info(f" {i}. {question}")
+
+ # Create the QueryContext
+ query_context = QueryContext(
+ main_query=main_query, sub_questions=sub_questions
+ )
+
+ # Log step metadata
+ execution_time = time.time() - start_time
+ log_metadata(
+ metadata={
+ "query_decomposition": {
+ "execution_time_seconds": execution_time,
+ "num_sub_questions": len(sub_questions),
+ "llm_model": llm_model,
+ "max_sub_questions_requested": max_sub_questions,
+ "fallback_used": False,
+ "main_query_length": len(main_query),
+ "sub_questions": sub_questions,
+ }
+ }
+ )
+
+ # Log model metadata for cross-pipeline tracking
+ log_metadata(
+ metadata={
+ "research_scope": {
+ "num_sub_questions": len(sub_questions),
+ }
+ },
+ infer_model=True,
+ )
+
+ # Log artifact metadata for the output query context
+ log_metadata(
+ metadata={
+ "query_context_characteristics": {
+ "main_query": main_query,
+ "num_sub_questions": len(sub_questions),
+ "timestamp": query_context.decomposition_timestamp,
+ }
+ },
+ infer_artifact=True,
+ )
+
+ # Add tags to the artifact
+ # add_tags(tags=["query", "decomposed"], artifact_name="query_context", infer_artifact=True)
+
+ return query_context
+
+ except Exception as e:
+ logger.error(f"Error decomposing query: {e}")
+ # Return fallback questions
+ fallback_questions = [
+ f"What is {main_query}?",
+ f"What are the key aspects of {main_query}?",
+ f"What are the implications of {main_query}?",
+ ]
+ fallback_questions = fallback_questions[:max_sub_questions]
+ logger.info(f"Using {len(fallback_questions)} fallback questions:")
+ for i, question in enumerate(fallback_questions, 1):
+ logger.info(f" {i}. {question}")
+
+ # Create QueryContext with fallback questions
+ query_context = QueryContext(
+ main_query=main_query, sub_questions=fallback_questions
+ )
+
+ # Log metadata for fallback scenario
+ execution_time = time.time() - start_time
+ log_metadata(
+ metadata={
+ "query_decomposition": {
+ "execution_time_seconds": execution_time,
+ "num_sub_questions": len(fallback_questions),
+ "llm_model": llm_model,
+ "max_sub_questions_requested": max_sub_questions,
+ "fallback_used": True,
+ "error_message": str(e),
+ "main_query_length": len(main_query),
+ "sub_questions": fallback_questions,
+ }
+ }
+ )
+
+ # Log model metadata for cross-pipeline tracking
+ log_metadata(
+ metadata={
+ "research_scope": {
+ "num_sub_questions": len(fallback_questions),
+ }
+ },
+ infer_model=True,
+ )
+
+ # Add tags to the artifact
+ # add_tags(
+ # tags=["query", "decomposed", "fallback"], artifact_name="query_context", infer_artifact=True
+ # )
+
+ return query_context
diff --git a/deep_research/tests/__init__.py b/deep_research/tests/__init__.py
new file mode 100644
index 00000000..6206856b
--- /dev/null
+++ b/deep_research/tests/__init__.py
@@ -0,0 +1 @@
+"""Test package for ZenML Deep Research project."""
diff --git a/deep_research/tests/conftest.py b/deep_research/tests/conftest.py
new file mode 100644
index 00000000..b972a5e1
--- /dev/null
+++ b/deep_research/tests/conftest.py
@@ -0,0 +1,11 @@
+"""Test configuration for pytest.
+
+This file sets up the proper Python path for importing modules in tests.
+"""
+
+import os
+import sys
+
+# Add the project root directory to the Python path
+project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
+sys.path.insert(0, project_root)
diff --git a/deep_research/tests/test_approval_utils.py b/deep_research/tests/test_approval_utils.py
new file mode 100644
index 00000000..f1dd15a5
--- /dev/null
+++ b/deep_research/tests/test_approval_utils.py
@@ -0,0 +1,120 @@
+"""Unit tests for approval utility functions."""
+
+from utils.approval_utils import (
+ calculate_estimated_cost,
+ format_approval_request,
+ format_critique_summary,
+ format_query_list,
+ parse_approval_response,
+)
+
+
+def test_parse_approval_responses():
+ """Test parsing different approval responses."""
+ queries = ["query1", "query2", "query3"]
+
+ # Test approve all
+ decision = parse_approval_response("APPROVE ALL", queries)
+ assert decision.approved == True
+ assert decision.selected_queries == queries
+ assert decision.approval_method == "APPROVE_ALL"
+
+ # Test skip
+ decision = parse_approval_response(
+ "skip", queries
+ ) # Test case insensitive
+ assert decision.approved == False
+ assert decision.selected_queries == []
+ assert decision.approval_method == "SKIP"
+
+ # Test selection
+ decision = parse_approval_response("SELECT 1,3", queries)
+ assert decision.approved == True
+ assert decision.selected_queries == ["query1", "query3"]
+ assert decision.approval_method == "SELECT_SPECIFIC"
+
+ # Test invalid selection
+ decision = parse_approval_response("SELECT invalid", queries)
+ assert decision.approved == False
+ assert decision.approval_method == "PARSE_ERROR"
+
+ # Test out of range indices
+ decision = parse_approval_response("SELECT 1,5,10", queries)
+ assert decision.approved == True
+ assert decision.selected_queries == ["query1"] # Only valid indices
+ assert decision.approval_method == "SELECT_SPECIFIC"
+
+ # Test unknown response
+ decision = parse_approval_response("maybe later", queries)
+ assert decision.approved == False
+ assert decision.approval_method == "UNKNOWN_RESPONSE"
+
+
+def test_format_approval_request():
+ """Test formatting of approval request messages."""
+ message = format_approval_request(
+ main_query="Test query",
+ progress_summary={
+ "completed_count": 5,
+ "avg_confidence": 0.75,
+ "low_confidence_count": 2,
+ },
+ critique_points=[
+ {"issue": "Missing data", "importance": "high"},
+ {"issue": "Minor gap", "importance": "low"},
+ ],
+ proposed_queries=["query1", "query2"],
+ )
+
+ assert "Test query" in message
+ assert "5" in message
+ assert "0.75" in message
+ assert "2 queries" in message
+ assert "approve" in message.lower()
+ assert "reject" in message.lower()
+ assert "Missing data" in message
+
+
+def test_format_critique_summary():
+ """Test critique summary formatting."""
+ # Test with no critiques
+ result = format_critique_summary([])
+ assert result == "No critical issues identified."
+
+ # Test with few critiques
+ critiques = [{"issue": "Issue 1"}, {"issue": "Issue 2"}]
+ result = format_critique_summary(critiques)
+ assert "- Issue 1" in result
+ assert "- Issue 2" in result
+ assert "more issues" not in result
+
+ # Test with many critiques
+ critiques = [{"issue": f"Issue {i}"} for i in range(5)]
+ result = format_critique_summary(critiques)
+ assert "- Issue 0" in result
+ assert "- Issue 1" in result
+ assert "- Issue 2" in result
+ assert "- Issue 3" not in result
+ assert "... and 2 more issues" in result
+
+
+def test_format_query_list():
+ """Test query list formatting."""
+ # Test empty list
+ result = format_query_list([])
+ assert result == "No queries proposed."
+
+ # Test with queries
+ queries = ["Query A", "Query B", "Query C"]
+ result = format_query_list(queries)
+ assert "1. Query A" in result
+ assert "2. Query B" in result
+ assert "3. Query C" in result
+
+
+def test_calculate_estimated_cost():
+ """Test cost estimation."""
+ assert calculate_estimated_cost([]) == 0.0
+ assert calculate_estimated_cost(["q1"]) == 0.01
+ assert calculate_estimated_cost(["q1", "q2", "q3"]) == 0.03
+ assert calculate_estimated_cost(["q1"] * 10) == 0.10
diff --git a/deep_research/tests/test_artifact_models.py b/deep_research/tests/test_artifact_models.py
new file mode 100644
index 00000000..415862b5
--- /dev/null
+++ b/deep_research/tests/test_artifact_models.py
@@ -0,0 +1,210 @@
+"""Tests for the new artifact models."""
+
+import time
+
+import pytest
+from utils.pydantic_models import (
+ AnalysisData,
+ FinalReport,
+ QueryContext,
+ ReflectionMetadata,
+ SearchCostDetail,
+ SearchData,
+ SearchResult,
+ SynthesisData,
+ SynthesizedInfo,
+ ViewpointAnalysis,
+)
+
+
+class TestQueryContext:
+ """Test the QueryContext artifact."""
+
+ def test_query_context_creation(self):
+ """Test creating a QueryContext."""
+ query = QueryContext(
+ main_query="What is quantum computing?",
+ sub_questions=["What are qubits?", "How do quantum gates work?"],
+ )
+
+ assert query.main_query == "What is quantum computing?"
+ assert len(query.sub_questions) == 2
+ assert query.decomposition_timestamp > 0
+
+ def test_query_context_immutable(self):
+ """Test that QueryContext is immutable."""
+ query = QueryContext(main_query="Test query", sub_questions=[])
+
+ # Should raise error when trying to modify
+ with pytest.raises(Exception): # Pydantic will raise validation error
+ query.main_query = "Modified query"
+
+ def test_query_context_defaults(self):
+ """Test QueryContext with defaults."""
+ query = QueryContext(main_query="Test")
+ assert query.sub_questions == []
+ assert query.decomposition_timestamp > 0
+
+
+class TestSearchData:
+ """Test the SearchData artifact."""
+
+ def test_search_data_creation(self):
+ """Test creating SearchData."""
+ search_data = SearchData()
+
+ assert search_data.search_results == {}
+ assert search_data.search_costs == {}
+ assert search_data.search_cost_details == []
+ assert search_data.total_searches == 0
+
+ def test_search_data_with_results(self):
+ """Test SearchData with actual results."""
+ result = SearchResult(
+ url="https://example.com",
+ content="Test content",
+ title="Test Title",
+ )
+
+ cost_detail = SearchCostDetail(
+ provider="exa",
+ query="test query",
+ cost=0.01,
+ timestamp=time.time(),
+ step="process_sub_question",
+ )
+
+ search_data = SearchData(
+ search_results={"Question 1": [result]},
+ search_costs={"exa": 0.01},
+ search_cost_details=[cost_detail],
+ total_searches=1,
+ )
+
+ assert len(search_data.search_results) == 1
+ assert search_data.search_costs["exa"] == 0.01
+ assert len(search_data.search_cost_details) == 1
+ assert search_data.total_searches == 1
+
+ def test_search_data_merge(self):
+ """Test merging SearchData instances."""
+ # Create first instance
+ data1 = SearchData(
+ search_results={
+ "Q1": [SearchResult(url="url1", content="content1")]
+ },
+ search_costs={"exa": 0.01},
+ total_searches=1,
+ )
+
+ # Create second instance
+ data2 = SearchData(
+ search_results={
+ "Q1": [SearchResult(url="url2", content="content2")],
+ "Q2": [SearchResult(url="url3", content="content3")],
+ },
+ search_costs={"exa": 0.02, "tavily": 0.01},
+ total_searches=2,
+ )
+
+ # Merge
+ data1.merge(data2)
+
+ # Check results
+ assert len(data1.search_results["Q1"]) == 2 # Merged Q1 results
+ assert "Q2" in data1.search_results # Added Q2
+ assert data1.search_costs["exa"] == 0.03 # Combined costs
+ assert data1.search_costs["tavily"] == 0.01 # New provider
+ assert data1.total_searches == 3
+
+
+class TestSynthesisData:
+ """Test the SynthesisData artifact."""
+
+ def test_synthesis_data_creation(self):
+ """Test creating SynthesisData."""
+ synthesis = SynthesisData()
+
+ assert synthesis.synthesized_info == {}
+ assert synthesis.enhanced_info == {}
+
+ def test_synthesis_data_with_info(self):
+ """Test SynthesisData with synthesized info."""
+ synth_info = SynthesizedInfo(
+ synthesized_answer="Test answer",
+ key_sources=["source1", "source2"],
+ confidence_level="high",
+ )
+
+ synthesis = SynthesisData(synthesized_info={"Q1": synth_info})
+
+ assert "Q1" in synthesis.synthesized_info
+ assert synthesis.synthesized_info["Q1"].confidence_level == "high"
+
+ def test_synthesis_data_merge(self):
+ """Test merging SynthesisData instances."""
+ info1 = SynthesizedInfo(synthesized_answer="Answer 1")
+ info2 = SynthesizedInfo(synthesized_answer="Answer 2")
+
+ data1 = SynthesisData(synthesized_info={"Q1": info1})
+ data2 = SynthesisData(synthesized_info={"Q2": info2})
+
+ data1.merge(data2)
+
+ assert "Q1" in data1.synthesized_info
+ assert "Q2" in data1.synthesized_info
+
+
+class TestAnalysisData:
+ """Test the AnalysisData artifact."""
+
+ def test_analysis_data_creation(self):
+ """Test creating AnalysisData."""
+ analysis = AnalysisData()
+
+ assert analysis.viewpoint_analysis is None
+ assert analysis.reflection_metadata is None
+
+ def test_analysis_data_with_viewpoint(self):
+ """Test AnalysisData with viewpoint analysis."""
+ viewpoint = ViewpointAnalysis(
+ main_points_of_agreement=["Point 1", "Point 2"],
+ perspective_gaps="Some gaps",
+ )
+
+ analysis = AnalysisData(viewpoint_analysis=viewpoint)
+
+ assert analysis.viewpoint_analysis is not None
+ assert len(analysis.viewpoint_analysis.main_points_of_agreement) == 2
+
+ def test_analysis_data_with_reflection(self):
+ """Test AnalysisData with reflection metadata."""
+ reflection = ReflectionMetadata(
+ critique_summary=["Critique 1"], improvements_made=3.0
+ )
+
+ analysis = AnalysisData(reflection_metadata=reflection)
+
+ assert analysis.reflection_metadata is not None
+ assert analysis.reflection_metadata.improvements_made == 3.0
+
+
+class TestFinalReport:
+ """Test the FinalReport artifact."""
+
+ def test_final_report_creation(self):
+ """Test creating FinalReport."""
+ report = FinalReport()
+
+ assert report.report_html == ""
+ assert report.generated_at > 0
+ assert report.main_query == ""
+
+ def test_final_report_with_content(self):
+ """Test FinalReport with HTML content."""
+ html = "Test Report"
+ report = FinalReport(report_html=html, main_query="What is AI?")
+
+ assert report.report_html == html
+ assert report.main_query == "What is AI?"
+ assert report.generated_at > 0
diff --git a/deep_research/tests/test_prompt_models.py b/deep_research/tests/test_prompt_models.py
new file mode 100644
index 00000000..fcc437d4
--- /dev/null
+++ b/deep_research/tests/test_prompt_models.py
@@ -0,0 +1,110 @@
+"""Unit tests for prompt models and utilities."""
+
+from utils.prompt_models import PromptTemplate
+from utils.pydantic_models import Prompt
+
+
+class TestPromptTemplate:
+ """Test cases for PromptTemplate model."""
+
+ def test_prompt_template_creation(self):
+ """Test creating a prompt template with all fields."""
+ prompt = PromptTemplate(
+ name="test_prompt",
+ content="This is a test prompt",
+ description="A test prompt for unit testing",
+ version="1.0.0",
+ tags=["test", "unit"],
+ )
+
+ assert prompt.name == "test_prompt"
+ assert prompt.content == "This is a test prompt"
+ assert prompt.description == "A test prompt for unit testing"
+ assert prompt.version == "1.0.0"
+ assert prompt.tags == ["test", "unit"]
+
+ def test_prompt_template_minimal(self):
+ """Test creating a prompt template with minimal fields."""
+ prompt = PromptTemplate(
+ name="minimal_prompt", content="Minimal content"
+ )
+
+ assert prompt.name == "minimal_prompt"
+ assert prompt.content == "Minimal content"
+ assert prompt.description == ""
+ assert prompt.version == "1.0.0"
+ assert prompt.tags == []
+
+
+class TestPrompt:
+ """Test cases for the new Prompt model."""
+
+ def test_prompt_creation(self):
+ """Test creating a prompt with all fields."""
+ prompt = Prompt(
+ name="test_prompt",
+ content="This is a test prompt",
+ description="A test prompt for unit testing",
+ version="1.0.0",
+ tags=["test", "unit"],
+ )
+
+ assert prompt.name == "test_prompt"
+ assert prompt.content == "This is a test prompt"
+ assert prompt.description == "A test prompt for unit testing"
+ assert prompt.version == "1.0.0"
+ assert prompt.tags == ["test", "unit"]
+
+ def test_prompt_minimal(self):
+ """Test creating a prompt with minimal fields."""
+ prompt = Prompt(name="minimal_prompt", content="Minimal content")
+
+ assert prompt.name == "minimal_prompt"
+ assert prompt.content == "Minimal content"
+ assert prompt.description == ""
+ assert prompt.version == "1.0.0"
+ assert prompt.tags == []
+
+ def test_prompt_str_conversion(self):
+ """Test converting prompt to string returns content."""
+ prompt = Prompt(
+ name="test_prompt",
+ content="This is the prompt content",
+ description="Test prompt",
+ )
+
+ assert str(prompt) == "This is the prompt content"
+
+ def test_prompt_repr(self):
+ """Test prompt representation."""
+ prompt = Prompt(name="test_prompt", content="Content", version="2.0.0")
+
+ assert repr(prompt) == "Prompt(name='test_prompt', version='2.0.0')"
+
+ def test_prompt_create_factory(self):
+ """Test creating prompt using factory method."""
+ prompt = Prompt.create(
+ content="Factory created prompt",
+ name="factory_prompt",
+ description="Created via factory",
+ version="1.1.0",
+ tags=["factory", "test"],
+ )
+
+ assert prompt.name == "factory_prompt"
+ assert prompt.content == "Factory created prompt"
+ assert prompt.description == "Created via factory"
+ assert prompt.version == "1.1.0"
+ assert prompt.tags == ["factory", "test"]
+
+ def test_prompt_create_factory_minimal(self):
+ """Test creating prompt using factory method with minimal args."""
+ prompt = Prompt.create(
+ content="Minimal factory prompt", name="minimal_factory"
+ )
+
+ assert prompt.name == "minimal_factory"
+ assert prompt.content == "Minimal factory prompt"
+ assert prompt.description == ""
+ assert prompt.version == "1.0.0"
+ assert prompt.tags == []
diff --git a/deep_research/tests/test_pydantic_final_report_step.py b/deep_research/tests/test_pydantic_final_report_step.py
new file mode 100644
index 00000000..b4dcd956
--- /dev/null
+++ b/deep_research/tests/test_pydantic_final_report_step.py
@@ -0,0 +1,265 @@
+"""Tests for the Pydantic-based final report step.
+
+This module contains tests for the Pydantic-based implementation of
+final_report_step, which uses the new Pydantic models and materializers.
+"""
+
+from typing import Dict, List
+
+import pytest
+from steps.pydantic_final_report_step import pydantic_final_report_step
+from utils.pydantic_models import (
+ AnalysisData,
+ FinalReport,
+ Prompt,
+ QueryContext,
+ ReflectionMetadata,
+ SearchData,
+ SearchResult,
+ SynthesisData,
+ SynthesizedInfo,
+ ViewpointAnalysis,
+ ViewpointTension,
+)
+from zenml.types import HTMLString
+
+
+@pytest.fixture
+def sample_artifacts():
+ """Create sample artifacts for testing."""
+ # Create QueryContext
+ query_context = QueryContext(
+ main_query="What are the impacts of climate change?",
+ sub_questions=["Economic impacts", "Environmental impacts"],
+ )
+
+ # Create SearchData
+ search_results: Dict[str, List[SearchResult]] = {
+ "Economic impacts": [
+ SearchResult(
+ url="https://example.com/economy",
+ title="Economic Impacts of Climate Change",
+ snippet="Overview of economic impacts",
+ content="Detailed content about economic impacts of climate change",
+ )
+ ],
+ "Environmental impacts": [
+ SearchResult(
+ url="https://example.com/environment",
+ title="Environmental Impacts",
+ snippet="Environmental impact overview",
+ content="Content about environmental impacts",
+ )
+ ],
+ }
+ search_data = SearchData(search_results=search_results)
+
+ # Create SynthesisData
+ synthesized_info: Dict[str, SynthesizedInfo] = {
+ "Economic impacts": SynthesizedInfo(
+ synthesized_answer="Climate change will have significant economic impacts...",
+ key_sources=["https://example.com/economy"],
+ confidence_level="high",
+ ),
+ "Environmental impacts": SynthesizedInfo(
+ synthesized_answer="Environmental impacts include rising sea levels...",
+ key_sources=["https://example.com/environment"],
+ confidence_level="high",
+ ),
+ }
+ synthesis_data = SynthesisData(
+ synthesized_info=synthesized_info,
+ enhanced_info=synthesized_info, # Same as synthesized for this test
+ )
+
+ # Create AnalysisData
+ viewpoint_analysis = ViewpointAnalysis(
+ main_points_of_agreement=[
+ "Climate change is happening",
+ "Action is needed",
+ ],
+ areas_of_tension=[
+ ViewpointTension(
+ topic="Economic policy",
+ viewpoints={
+ "Progressive": "Support carbon taxes and regulations",
+ "Conservative": "Prefer market-based solutions",
+ },
+ )
+ ],
+ perspective_gaps="Indigenous perspectives are underrepresented",
+ integrative_insights="A balanced approach combining regulations and market incentives may be most effective",
+ )
+
+ reflection_metadata = ReflectionMetadata(
+ critique_summary=["Need more sources for economic impacts"],
+ additional_questions_identified=[
+ "How will climate change affect different regions?"
+ ],
+ searches_performed=[
+ "economic impacts of climate change",
+ "regional climate impacts",
+ ],
+ improvements_made=2.0,
+ )
+
+ analysis_data = AnalysisData(
+ viewpoint_analysis=viewpoint_analysis,
+ reflection_metadata=reflection_metadata,
+ )
+
+ # Create prompts
+ conclusion_prompt = Prompt(
+ name="conclusion_generation",
+ content="Generate a conclusion based on the research findings.",
+ )
+ executive_summary_prompt = Prompt(
+ name="executive_summary", content="Generate an executive summary."
+ )
+ introduction_prompt = Prompt(
+ name="introduction", content="Generate an introduction."
+ )
+
+ return {
+ "query_context": query_context,
+ "search_data": search_data,
+ "synthesis_data": synthesis_data,
+ "analysis_data": analysis_data,
+ "conclusion_generation_prompt": conclusion_prompt,
+ "executive_summary_prompt": executive_summary_prompt,
+ "introduction_prompt": introduction_prompt,
+ }
+
+
+def test_pydantic_final_report_step_returns_tuple():
+ """Test that the step returns a tuple with FinalReport and HTML."""
+ # Create simple artifacts
+ query_context = QueryContext(
+ main_query="What is climate change?",
+ sub_questions=["What causes climate change?"],
+ )
+ search_data = SearchData()
+ synthesis_data = SynthesisData(
+ synthesized_info={
+ "What causes climate change?": SynthesizedInfo(
+ synthesized_answer="Climate change is caused by greenhouse gases.",
+ confidence_level="high",
+ key_sources=["https://example.com/causes"],
+ )
+ }
+ )
+ analysis_data = AnalysisData()
+
+ # Create prompts
+ conclusion_prompt = Prompt(
+ name="conclusion_generation", content="Generate a conclusion."
+ )
+ executive_summary_prompt = Prompt(
+ name="executive_summary", content="Generate summary."
+ )
+ introduction_prompt = Prompt(
+ name="introduction", content="Generate intro."
+ )
+
+ # Run the step
+ result = pydantic_final_report_step(
+ query_context=query_context,
+ search_data=search_data,
+ synthesis_data=synthesis_data,
+ analysis_data=analysis_data,
+ conclusion_generation_prompt=conclusion_prompt,
+ executive_summary_prompt=executive_summary_prompt,
+ introduction_prompt=introduction_prompt,
+ )
+
+ # Assert that result is a tuple with 2 elements
+ assert isinstance(result, tuple)
+ assert len(result) == 2
+
+ # Assert first element is FinalReport
+ assert isinstance(result[0], FinalReport)
+
+ # Assert second element is HTMLString
+ assert isinstance(result[1], HTMLString)
+
+
+def test_pydantic_final_report_step_with_complex_artifacts(sample_artifacts):
+ """Test that the step handles complex artifacts properly."""
+ # Run the step with complex artifacts
+ result = pydantic_final_report_step(
+ query_context=sample_artifacts["query_context"],
+ search_data=sample_artifacts["search_data"],
+ synthesis_data=sample_artifacts["synthesis_data"],
+ analysis_data=sample_artifacts["analysis_data"],
+ conclusion_generation_prompt=sample_artifacts[
+ "conclusion_generation_prompt"
+ ],
+ executive_summary_prompt=sample_artifacts["executive_summary_prompt"],
+ introduction_prompt=sample_artifacts["introduction_prompt"],
+ )
+
+ # Unpack the results
+ final_report, html_report = result
+
+ # Assert FinalReport contains expected data
+ assert final_report.main_query == "What are the impacts of climate change?"
+ assert len(final_report.sub_questions) == 2
+ assert final_report.report_html != ""
+
+ # Assert HTML report contains key elements
+ html_str = str(html_report)
+ assert "Economic impacts" in html_str
+ assert "Environmental impacts" in html_str
+ assert "Viewpoint Analysis" in html_str
+ assert "Progressive" in html_str
+ assert "Conservative" in html_str
+
+
+def test_pydantic_final_report_step_creates_report():
+ """Test that the step properly creates a final report."""
+ # Create artifacts
+ query_context = QueryContext(
+ main_query="What is climate change?",
+ sub_questions=["What causes climate change?"],
+ )
+ search_data = SearchData()
+ synthesis_data = SynthesisData(
+ synthesized_info={
+ "What causes climate change?": SynthesizedInfo(
+ synthesized_answer="Climate change is caused by greenhouse gases.",
+ confidence_level="high",
+ key_sources=["https://example.com/causes"],
+ )
+ }
+ )
+ analysis_data = AnalysisData()
+
+ # Create prompts
+ conclusion_prompt = Prompt(
+ name="conclusion_generation", content="Generate a conclusion."
+ )
+ executive_summary_prompt = Prompt(
+ name="executive_summary", content="Generate summary."
+ )
+ introduction_prompt = Prompt(
+ name="introduction", content="Generate intro."
+ )
+
+ # Run the step
+ final_report, html_report = pydantic_final_report_step(
+ query_context=query_context,
+ search_data=search_data,
+ synthesis_data=synthesis_data,
+ analysis_data=analysis_data,
+ conclusion_generation_prompt=conclusion_prompt,
+ executive_summary_prompt=executive_summary_prompt,
+ introduction_prompt=introduction_prompt,
+ )
+
+ # Verify FinalReport was created with content
+ assert final_report.report_html != ""
+ assert "climate change" in final_report.report_html.lower()
+
+ # Verify HTML report was created
+ assert str(html_report) != ""
+ assert "climate change" in str(html_report).lower()
diff --git a/deep_research/tests/test_pydantic_models.py b/deep_research/tests/test_pydantic_models.py
new file mode 100644
index 00000000..21d25123
--- /dev/null
+++ b/deep_research/tests/test_pydantic_models.py
@@ -0,0 +1,199 @@
+"""Tests for Pydantic model implementations.
+
+This module contains tests for the Pydantic models that validate:
+1. Basic model instantiation
+2. Default values
+3. Serialization and deserialization
+4. Method functionality
+"""
+
+import json
+
+from utils.pydantic_models import (
+ ReflectionMetadata,
+ SearchResult,
+ SynthesizedInfo,
+ ViewpointAnalysis,
+ ViewpointTension,
+)
+
+
+def test_search_result_creation():
+ """Test creating a SearchResult model."""
+ # Create with defaults
+ result = SearchResult()
+ assert result.url == ""
+ assert result.content == ""
+ assert result.title == ""
+ assert result.snippet == ""
+
+ # Create with values
+ result = SearchResult(
+ url="https://example.com",
+ content="Example content",
+ title="Example Title",
+ snippet="This is a snippet",
+ )
+ assert result.url == "https://example.com"
+ assert result.content == "Example content"
+ assert result.title == "Example Title"
+ assert result.snippet == "This is a snippet"
+
+
+def test_search_result_serialization():
+ """Test serializing and deserializing a SearchResult."""
+ result = SearchResult(
+ url="https://example.com",
+ content="Example content",
+ title="Example Title",
+ snippet="This is a snippet",
+ )
+
+ # Serialize to dict
+ result_dict = result.model_dump()
+ assert result_dict["url"] == "https://example.com"
+ assert result_dict["content"] == "Example content"
+
+ # Serialize to JSON
+ result_json = result.model_dump_json()
+ result_dict_from_json = json.loads(result_json)
+ assert result_dict_from_json["url"] == "https://example.com"
+
+ # Deserialize from dict
+ new_result = SearchResult.model_validate(result_dict)
+ assert new_result.url == "https://example.com"
+ assert new_result.content == "Example content"
+
+ # Deserialize from JSON
+ new_result_from_json = SearchResult.model_validate_json(result_json)
+ assert new_result_from_json.url == "https://example.com"
+
+
+def test_viewpoint_tension_model():
+ """Test the ViewpointTension model."""
+ # Empty model
+ tension = ViewpointTension()
+ assert tension.topic == ""
+ assert tension.viewpoints == {}
+
+ # With data
+ tension = ViewpointTension(
+ topic="Climate Change Impacts",
+ viewpoints={
+ "Economic": "Focuses on financial costs and benefits",
+ "Environmental": "Emphasizes ecosystem impacts",
+ },
+ )
+ assert tension.topic == "Climate Change Impacts"
+ assert len(tension.viewpoints) == 2
+ assert "Economic" in tension.viewpoints
+
+ # Serialization
+ tension_dict = tension.model_dump()
+ assert tension_dict["topic"] == "Climate Change Impacts"
+ assert len(tension_dict["viewpoints"]) == 2
+
+ # Deserialization
+ new_tension = ViewpointTension.model_validate(tension_dict)
+ assert new_tension.topic == tension.topic
+ assert new_tension.viewpoints == tension.viewpoints
+
+
+def test_synthesized_info_model():
+ """Test the SynthesizedInfo model."""
+ # Default values
+ info = SynthesizedInfo()
+ assert info.synthesized_answer == ""
+ assert info.key_sources == []
+ assert info.confidence_level == "medium"
+ assert info.information_gaps == ""
+ assert info.improvements == []
+
+ # With values
+ info = SynthesizedInfo(
+ synthesized_answer="This is a synthesized answer",
+ key_sources=["https://source1.com", "https://source2.com"],
+ confidence_level="high",
+ information_gaps="Missing some context",
+ improvements=["Add more detail", "Check more sources"],
+ )
+ assert info.synthesized_answer == "This is a synthesized answer"
+ assert len(info.key_sources) == 2
+ assert info.confidence_level == "high"
+
+ # Serialization and deserialization
+ info_dict = info.model_dump()
+ new_info = SynthesizedInfo.model_validate(info_dict)
+ assert new_info.synthesized_answer == info.synthesized_answer
+ assert new_info.key_sources == info.key_sources
+
+
+def test_viewpoint_analysis_model():
+ """Test the ViewpointAnalysis model."""
+ # Create tensions for the analysis
+ tension1 = ViewpointTension(
+ topic="Economic Impact",
+ viewpoints={
+ "Positive": "Creates jobs",
+ "Negative": "Increases inequality",
+ },
+ )
+ tension2 = ViewpointTension(
+ topic="Environmental Impact",
+ viewpoints={
+ "Positive": "Reduces emissions",
+ "Negative": "Land use changes",
+ },
+ )
+
+ # Create the analysis
+ analysis = ViewpointAnalysis(
+ main_points_of_agreement=[
+ "Need for action",
+ "Technological innovation",
+ ],
+ areas_of_tension=[tension1, tension2],
+ perspective_gaps="Missing indigenous perspectives",
+ integrative_insights="Combined economic and environmental approach needed",
+ )
+
+ assert len(analysis.main_points_of_agreement) == 2
+ assert len(analysis.areas_of_tension) == 2
+ assert analysis.areas_of_tension[0].topic == "Economic Impact"
+
+ # Test serialization
+ analysis_dict = analysis.model_dump()
+ assert len(analysis_dict["areas_of_tension"]) == 2
+ assert analysis_dict["areas_of_tension"][0]["topic"] == "Economic Impact"
+
+ # Test deserialization
+ new_analysis = ViewpointAnalysis.model_validate(analysis_dict)
+ assert len(new_analysis.areas_of_tension) == 2
+ assert new_analysis.areas_of_tension[0].topic == "Economic Impact"
+ assert new_analysis.perspective_gaps == analysis.perspective_gaps
+
+
+def test_reflection_metadata_model():
+ """Test the ReflectionMetadata model."""
+ metadata = ReflectionMetadata(
+ critique_summary=["Need more sources", "Missing detailed analysis"],
+ additional_questions_identified=["What about future trends?"],
+ searches_performed=["future climate trends", "economic impacts"],
+ improvements_made=3,
+ error=None,
+ )
+
+ assert len(metadata.critique_summary) == 2
+ assert len(metadata.additional_questions_identified) == 1
+ assert metadata.improvements_made == 3
+ assert metadata.error is None
+
+ # Serialization
+ metadata_dict = metadata.model_dump()
+ assert len(metadata_dict["critique_summary"]) == 2
+ assert metadata_dict["improvements_made"] == 3
+
+ # Deserialization
+ new_metadata = ReflectionMetadata.model_validate(metadata_dict)
+ assert new_metadata.improvements_made == metadata.improvements_made
+ assert new_metadata.critique_summary == metadata.critique_summary
diff --git a/deep_research/utils/__init__.py b/deep_research/utils/__init__.py
new file mode 100644
index 00000000..395e1d67
--- /dev/null
+++ b/deep_research/utils/__init__.py
@@ -0,0 +1,7 @@
+"""
+Utilities package for the ZenML Deep Research project.
+
+This package contains various utility functions and helpers used throughout the project,
+including data models, LLM interaction utilities, search functionality, and common helper
+functions for text processing and state management.
+"""
diff --git a/deep_research/utils/approval_utils.py b/deep_research/utils/approval_utils.py
new file mode 100644
index 00000000..94cd5a47
--- /dev/null
+++ b/deep_research/utils/approval_utils.py
@@ -0,0 +1,137 @@
+"""Utility functions for the human approval process."""
+
+from typing import Any, Dict, List
+
+from utils.pydantic_models import ApprovalDecision
+
+
+def format_critique_summary(critique_points: List[Dict[str, Any]]) -> str:
+ """Format critique points for display."""
+ if not critique_points:
+ return "No critical issues identified."
+
+ formatted = []
+ for point in critique_points[:3]: # Show top 3
+ issue = point.get("issue", "Unknown issue")
+ formatted.append(f"- {issue}")
+
+ if len(critique_points) > 3:
+ formatted.append(f"- ... and {len(critique_points) - 3} more issues")
+
+ return "\n".join(formatted)
+
+
+def format_query_list(queries: List[str]) -> str:
+ """Format query list for display."""
+ if not queries:
+ return "No queries proposed."
+
+ formatted = []
+ for i, query in enumerate(queries, 1):
+ formatted.append(f"{i}. {query}")
+
+ return "\n".join(formatted)
+
+
+def calculate_estimated_cost(queries: List[str]) -> float:
+ """Calculate estimated cost for additional queries."""
+ # Rough estimate: ~$0.01 per query (including search API + LLM costs)
+ return round(len(queries) * 0.01, 2)
+
+
+def format_approval_request(
+ main_query: str,
+ progress_summary: Dict[str, Any],
+ critique_points: List[Dict[str, Any]],
+ proposed_queries: List[str],
+ timeout: int = 3600,
+) -> str:
+ """Format the approval request message."""
+
+ # High-priority critiques
+ high_priority = [
+ c for c in critique_points if c.get("importance") == "high"
+ ]
+
+ message = f"""📊 **Research Progress Update**
+
+**Main Query:** {main_query}
+
+**Current Status:**
+- Sub-questions analyzed: {progress_summary["completed_count"]}
+- Average confidence: {progress_summary["avg_confidence"]}
+- Low confidence areas: {progress_summary["low_confidence_count"]}
+
+**Key Issues Identified:**
+{format_critique_summary(high_priority or critique_points)}
+
+**Proposed Additional Research** ({len(proposed_queries)} queries):
+{format_query_list(proposed_queries)}
+
+**Estimated Additional Time:** ~{len(proposed_queries) * 2} minutes
+**Estimated Additional Cost:** ~${calculate_estimated_cost(proposed_queries)}
+
+**Response Options:**
+- Reply with `approve`, `yes`, `ok`, or `LGTM` to proceed with all queries
+- Reply with `reject`, `no`, `skip`, or `decline` to finish with current findings
+
+**Timeout:** Response required within {timeout // 60} minutes"""
+
+ return message
+
+
+def parse_approval_response(
+ response: str, proposed_queries: List[str]
+) -> ApprovalDecision:
+ """Parse the approval response from user."""
+
+ response_upper = response.strip().upper()
+
+ if response_upper == "APPROVE ALL":
+ return ApprovalDecision(
+ approved=True,
+ selected_queries=proposed_queries,
+ approval_method="APPROVE_ALL",
+ reviewer_notes=response,
+ )
+
+ elif response_upper == "SKIP":
+ return ApprovalDecision(
+ approved=False,
+ selected_queries=[],
+ approval_method="SKIP",
+ reviewer_notes=response,
+ )
+
+ elif response_upper.startswith("SELECT"):
+ # Parse selection like "SELECT 1,3,5"
+ try:
+ # Extract the part after "SELECT"
+ selection_part = response_upper[6:].strip()
+ indices = [int(x.strip()) - 1 for x in selection_part.split(",")]
+ selected = [
+ proposed_queries[i]
+ for i in indices
+ if 0 <= i < len(proposed_queries)
+ ]
+ return ApprovalDecision(
+ approved=True,
+ selected_queries=selected,
+ approval_method="SELECT_SPECIFIC",
+ reviewer_notes=response,
+ )
+ except Exception as e:
+ return ApprovalDecision(
+ approved=False,
+ selected_queries=[],
+ approval_method="PARSE_ERROR",
+ reviewer_notes=f"Failed to parse: {response} - {str(e)}",
+ )
+
+ else:
+ return ApprovalDecision(
+ approved=False,
+ selected_queries=[],
+ approval_method="UNKNOWN_RESPONSE",
+ reviewer_notes=f"Unknown response: {response}",
+ )
diff --git a/deep_research/utils/config_utils.py b/deep_research/utils/config_utils.py
new file mode 100644
index 00000000..3c40fcde
--- /dev/null
+++ b/deep_research/utils/config_utils.py
@@ -0,0 +1,72 @@
+"""Configuration and environment utilities for the Deep Research Agent."""
+
+import logging
+import os
+from typing import Any, Dict
+
+import yaml
+
+logger = logging.getLogger(__name__)
+
+
+def load_pipeline_config(config_path: str) -> Dict[str, Any]:
+ """Load pipeline configuration from YAML file.
+
+ This is used only for pipeline-level configuration, not for step parameters.
+ Step parameters should be defined directly in the step functions.
+
+ Args:
+ config_path: Path to the configuration YAML file
+
+ Returns:
+ Pipeline configuration dictionary
+ """
+ # Get absolute path if relative
+ if not os.path.isabs(config_path):
+ base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+ config_path = os.path.join(base_dir, config_path)
+
+ # Load YAML configuration
+ try:
+ with open(config_path, "r") as f:
+ config = yaml.safe_load(f)
+ return config
+ except Exception as e:
+ logger.error(f"Error loading pipeline configuration: {e}")
+ # Return a minimal default configuration in case of loading error
+ return {
+ "pipeline": {
+ "name": "deep_research_pipeline",
+ "enable_cache": True,
+ },
+ "environment": {
+ "docker": {
+ "requirements": [
+ "openai>=1.0.0",
+ "tavily-python>=0.2.8",
+ "PyYAML>=6.0",
+ "click>=8.0.0",
+ "pydantic>=2.0.0",
+ "typing_extensions>=4.0.0",
+ ]
+ }
+ },
+ "resources": {"cpu": 1, "memory": "4Gi"},
+ "timeout": 3600,
+ }
+
+
+def check_required_env_vars(env_vars: list[str]) -> list[str]:
+ """Check if required environment variables are set.
+
+ Args:
+ env_vars: List of environment variable names to check
+
+ Returns:
+ List of missing environment variables
+ """
+ missing_vars = []
+ for var in env_vars:
+ if not os.environ.get(var):
+ missing_vars.append(var)
+ return missing_vars
diff --git a/deep_research/utils/css_utils.py b/deep_research/utils/css_utils.py
new file mode 100644
index 00000000..e2c5cc15
--- /dev/null
+++ b/deep_research/utils/css_utils.py
@@ -0,0 +1,267 @@
+"""CSS utility functions for consistent styling across materializers."""
+
+import json
+import os
+from typing import Optional
+
+
+def get_shared_css_path() -> str:
+ """Get the absolute path to the shared CSS file.
+
+ Returns:
+ Absolute path to assets/styles.css
+ """
+ base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+ return os.path.join(base_dir, "assets", "styles.css")
+
+
+def get_shared_css_content() -> str:
+ """Read and return the content of the shared CSS file.
+
+ Returns:
+ Content of the shared CSS file
+ """
+ css_path = get_shared_css_path()
+ try:
+ with open(css_path, "r") as f:
+ return f.read()
+ except FileNotFoundError:
+ # Fallback to basic styles if file not found
+ return """
+ body {
+ font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+ margin: 20px;
+ color: #333;
+ }
+ """
+
+
+def get_shared_css_tag() -> str:
+ """Get the complete style tag with shared CSS content.
+
+ Returns:
+ HTML style tag with shared CSS
+ """
+ css_content = get_shared_css_content()
+ return f""
+
+
+def get_confidence_class(level: str) -> str:
+ """Return appropriate CSS class for confidence level.
+
+ Args:
+ level: Confidence level (high, medium, low)
+
+ Returns:
+ CSS class string
+ """
+ return f"dr-confidence dr-confidence--{level.lower()}"
+
+
+def get_badge_class(badge_type: str) -> str:
+ """Return appropriate CSS class for badges.
+
+ Args:
+ badge_type: Badge type (success, warning, danger, info, primary)
+
+ Returns:
+ CSS class string
+ """
+ return f"dr-badge dr-badge--{badge_type.lower()}"
+
+
+def get_status_class(status: str) -> str:
+ """Return appropriate CSS class for status indicators.
+
+ Args:
+ status: Status type (approved, pending, rejected, etc.)
+
+ Returns:
+ CSS class string
+ """
+ status_map = {
+ "approved": "success",
+ "pending": "warning",
+ "rejected": "danger",
+ "completed": "success",
+ "in_progress": "info",
+ "failed": "danger",
+ }
+ badge_type = status_map.get(status.lower(), "primary")
+ return get_badge_class(badge_type)
+
+
+def get_section_class(section_type: Optional[str] = None) -> str:
+ """Return appropriate CSS class for sections.
+
+ Args:
+ section_type: Optional section type (info, warning, success, danger)
+
+ Returns:
+ CSS class string
+ """
+ if section_type:
+ return f"dr-section dr-section--{section_type.lower()}"
+ return "dr-section"
+
+
+def get_card_class(hoverable: bool = True) -> str:
+ """Return appropriate CSS class for cards.
+
+ Args:
+ hoverable: Whether the card should have hover effects
+
+ Returns:
+ CSS class string
+ """
+ classes = ["dr-card"]
+ if not hoverable:
+ classes.append("dr-card--no-hover")
+ return " ".join(classes)
+
+
+def get_table_class(striped: bool = False) -> str:
+ """Return appropriate CSS class for tables.
+
+ Args:
+ striped: Whether the table should have striped rows
+
+ Returns:
+ CSS class string
+ """
+ classes = ["dr-table"]
+ if striped:
+ classes.append("dr-table--striped")
+ return " ".join(classes)
+
+
+def get_button_class(
+ button_type: str = "primary", size: str = "normal"
+) -> str:
+ """Return appropriate CSS class for buttons.
+
+ Args:
+ button_type: Button type (primary, secondary, success)
+ size: Button size (normal, small)
+
+ Returns:
+ CSS class string
+ """
+ classes = ["dr-button"]
+ if button_type != "primary":
+ classes.append(f"dr-button--{button_type}")
+ if size == "small":
+ classes.append("dr-button--small")
+ return " ".join(classes)
+
+
+def get_grid_class(grid_type: str = "cards") -> str:
+ """Return appropriate CSS class for grid layouts.
+
+ Args:
+ grid_type: Grid type (stats, cards, metrics)
+
+ Returns:
+ CSS class string
+ """
+ return f"dr-grid dr-grid--{grid_type}"
+
+
+def wrap_with_container(content: str, wide: bool = False) -> str:
+ """Wrap content with container div.
+
+ Args:
+ content: HTML content to wrap
+ wide: Whether to use wide container
+
+ Returns:
+ Wrapped HTML content
+ """
+ container_class = (
+ "dr-container dr-container--wide" if wide else "dr-container"
+ )
+ return f'
{content}
'
+
+
+def create_stat_card(value: str, label: str, format_value: bool = True) -> str:
+ """Create a stat card HTML.
+
+ Args:
+ value: The statistic value
+ label: The label for the statistic
+ format_value: Whether to wrap value in stat-value div
+
+ Returns:
+ HTML for stat card
+ """
+ value_html = (
+ f'
+ """
+
+
+def extract_html_from_content(content: str) -> str:
+ """Attempt to extract HTML content from a response that might be wrapped in other formats.
+
+ Args:
+ content: The content to extract HTML from
+
+ Returns:
+ The extracted HTML, or a basic fallback if extraction fails
+ """
+ if not content:
+ return ""
+
+ # Try to find HTML between tags
+ if "" in content:
+ start = content.find("") + 7 # Include the closing tag
+ return content[start:end]
+
+ # Try to find div class="research-report"
+ if '
" in content:
+ start = content.find('
")
+ if last_div > start:
+ return content[start : last_div + 6] # Include the closing tag
+
+ # Look for code blocks
+ if "```html" in content and "```" in content:
+ start = content.find("```html") + 7
+ end = content.find("```", start)
+ if end > start:
+ return content[start:end].strip()
+
+ # Look for JSON with an "html" field
+ try:
+ parsed = json.loads(content)
+ if isinstance(parsed, dict) and "html" in parsed:
+ return parsed["html"]
+ except:
+ pass
+
+ # If all extraction attempts fail, return the original content
+ return content
diff --git a/deep_research/utils/llm_utils.py b/deep_research/utils/llm_utils.py
new file mode 100644
index 00000000..54dd27ee
--- /dev/null
+++ b/deep_research/utils/llm_utils.py
@@ -0,0 +1,458 @@
+import contextlib
+import json
+import logging
+from json.decoder import JSONDecodeError
+from typing import Any, Dict, List, Optional
+
+import litellm
+from litellm import completion
+from utils.prompts import SYNTHESIS_PROMPT
+from zenml import get_step_context
+
+logger = logging.getLogger(__name__)
+
+# This module uses litellm for all LLM interactions
+# Models are specified with a provider prefix (e.g., "sambanova/DeepSeek-R1-Distill-Llama-70B")
+# ALL model names require a provider prefix (e.g., "sambanova/", "openai/", "anthropic/")
+
+litellm.callbacks = ["langfuse"]
+
+
+def remove_reasoning_from_output(output: str) -> str:
+ """Remove the reasoning portion from LLM output.
+
+ Args:
+ output: Raw output from LLM that may contain reasoning
+
+ Returns:
+ Cleaned output without the reasoning section
+ """
+ if not output:
+ return ""
+
+ if "" in output:
+ return output.split("")[-1].strip()
+ return output.strip()
+
+
+def clean_json_tags(text: str) -> str:
+ """Clean JSON markdown tags from text.
+
+ Args:
+ text: Text with potential JSON markdown tags
+
+ Returns:
+ Cleaned text without JSON markdown tags
+ """
+ if not text:
+ return ""
+
+ cleaned = text.replace("```json\n", "").replace("\n```", "")
+ cleaned = cleaned.replace("```json", "").replace("```", "")
+ return cleaned
+
+
+def clean_markdown_tags(text: str) -> str:
+ """Clean Markdown tags from text.
+
+ Args:
+ text: Text with potential markdown tags
+
+ Returns:
+ Cleaned text without markdown tags
+ """
+ if not text:
+ return ""
+
+ cleaned = text.replace("```markdown\n", "").replace("\n```", "")
+ cleaned = cleaned.replace("```markdown", "").replace("```", "")
+ return cleaned
+
+
+def safe_json_loads(json_str: Optional[str]) -> Dict[str, Any]:
+ """Safely parse JSON string.
+
+ Args:
+ json_str: JSON string to parse, can be None.
+
+ Returns:
+ Dict[str, Any]: Parsed JSON as dictionary or empty dict if parsing fails or input is None.
+ """
+ if json_str is None:
+ # Optionally, log a warning here if None input is unexpected for certain call sites
+ # logger.warning("safe_json_loads received None input.")
+ return {}
+ try:
+ return json.loads(json_str)
+ except (
+ JSONDecodeError,
+ TypeError,
+ ): # Catch TypeError if json_str is not a valid type for json.loads
+ # Optionally, log the error and the problematic string (or its beginning)
+ # logger.warning(f"Failed to decode JSON string: '{str(json_str)[:200]}...'", exc_info=True)
+ return {}
+
+
+def run_llm_completion(
+ prompt: str,
+ system_prompt: str,
+ model: str = "openrouter/google/gemini-2.0-flash-lite-001",
+ clean_output: bool = True,
+ max_tokens: int = 2000, # Increased default token limit
+ temperature: float = 0.2,
+ top_p: float = 0.9,
+ project: str = "deep-research",
+ tags: Optional[List[str]] = None,
+) -> str:
+ """Run an LLM completion with standard error handling and output cleaning.
+
+ Uses litellm for model inference.
+
+ Args:
+ prompt: User prompt for the LLM
+ system_prompt: System prompt for the LLM
+ model: Model to use for completion (with provider prefix)
+ clean_output: Whether to clean reasoning and JSON tags from output. When True,
+ this removes any reasoning sections marked with tags and strips JSON
+ code block markers.
+ max_tokens: Maximum tokens to generate
+ temperature: Sampling temperature
+ top_p: Top-p sampling value
+ project: Langfuse project name for LLM tracking
+ tags: Optional list of tags for Langfuse tracking. If provided, also converted to trace_metadata format.
+
+ Returns:
+ str: Processed LLM output with optional cleaning applied
+ """
+ try:
+ # Ensure model name has provider prefix
+ if not any(
+ model.startswith(prefix + "/")
+ for prefix in [
+ "sambanova",
+ "openai",
+ "anthropic",
+ "meta",
+ "google",
+ "aws",
+ "openrouter",
+ ]
+ ):
+ # Raise an error if no provider prefix is specified
+ error_msg = f"Model '{model}' does not have a provider prefix. Please specify provider (e.g., 'sambanova/{model}')"
+ logger.error(error_msg)
+ raise ValueError(error_msg)
+
+ # Get pipeline run name and id for trace_name and trace_id if running in a step
+ trace_name = None
+ trace_id = None
+ with contextlib.suppress(RuntimeError):
+ context = get_step_context()
+ trace_name = context.pipeline_run.name
+ trace_id = str(context.pipeline_run.id)
+ # Build metadata dict
+ metadata = {"project": project}
+ if tags is not None:
+ metadata["tags"] = tags
+ # Convert tags to trace_metadata format
+ metadata["trace_metadata"] = {tag: True for tag in tags}
+ if trace_name:
+ metadata["trace_name"] = trace_name
+ if trace_id:
+ metadata["trace_id"] = trace_id
+
+ response = completion(
+ model=model,
+ messages=[
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": prompt},
+ ],
+ max_tokens=max_tokens,
+ temperature=temperature,
+ top_p=top_p,
+ metadata=metadata,
+ )
+
+ # Defensive access to content
+ content = None
+ if response and response.choices and len(response.choices) > 0:
+ choice = response.choices[0]
+ if choice and choice.message:
+ content = choice.message.content
+
+ if content is None:
+ logger.warning("LLM response content is missing or empty.")
+ return ""
+
+ if clean_output:
+ content = remove_reasoning_from_output(content)
+ content = clean_json_tags(content)
+
+ return content
+ except Exception as e:
+ logger.error(f"Error in LLM completion: {e}")
+ return ""
+
+
+def get_structured_llm_output(
+ prompt: str,
+ system_prompt: str,
+ model: str = "openrouter/google/gemini-2.0-flash-lite-001",
+ fallback_response: Optional[Dict[str, Any]] = None,
+ max_tokens: int = 2000, # Increased default token limit for structured outputs
+ temperature: float = 0.2,
+ top_p: float = 0.9,
+ project: str = "deep-research",
+ tags: Optional[List[str]] = None,
+) -> Dict[str, Any]:
+ """Get structured JSON output from an LLM with error handling.
+
+ Uses litellm for model inference.
+
+ Args:
+ prompt: User prompt for the LLM
+ system_prompt: System prompt for the LLM
+ model: Model to use for completion (with provider prefix)
+ fallback_response: Fallback response if parsing fails
+ max_tokens: Maximum tokens to generate
+ temperature: Sampling temperature
+ top_p: Top-p sampling value
+ project: Langfuse project name for LLM tracking
+ tags: Optional list of tags for Langfuse tracking. Defaults to ["structured_llm_output"] if None.
+
+ Returns:
+ Parsed JSON response or fallback
+ """
+ try:
+ # Use provided tags or default to ["structured_llm_output"]
+ if tags is None:
+ tags = ["structured_llm_output"]
+
+ content = run_llm_completion(
+ prompt=prompt,
+ system_prompt=system_prompt,
+ model=model,
+ clean_output=True,
+ max_tokens=max_tokens,
+ temperature=temperature,
+ top_p=top_p,
+ project=project,
+ tags=tags,
+ )
+
+ if not content:
+ logger.warning("Empty content returned from LLM")
+ return fallback_response if fallback_response is not None else {}
+
+ result = safe_json_loads(content)
+
+ if not result and fallback_response is not None:
+ return fallback_response
+
+ return result
+ except Exception as e:
+ logger.error(f"Error processing structured LLM output: {e}")
+ return fallback_response if fallback_response is not None else {}
+
+
+def is_text_relevant(text1: str, text2: str, min_word_length: int = 4) -> bool:
+ """Determine if two pieces of text are relevant to each other.
+
+ Relevance is determined by checking if one text is contained within the other,
+ or if they share significant words (words longer than min_word_length).
+ This is a simple heuristic approach that checks for:
+ 1. Complete containment (one text string inside the other)
+ 2. Shared significant words (words longer than min_word_length)
+
+ Args:
+ text1: First text to compare
+ text2: Second text to compare
+ min_word_length: Minimum length of words to check for shared content
+
+ Returns:
+ bool: True if the texts are deemed relevant to each other based on the criteria
+ """
+ if not text1 or not text2:
+ return False
+
+ return (
+ text1.lower() in text2.lower()
+ or text2.lower() in text1.lower()
+ or any(
+ word
+ for word in text1.lower().split()
+ if len(word) > min_word_length and word in text2.lower()
+ )
+ )
+
+
+def find_most_relevant_string(
+ target: str,
+ options: List[str],
+ model: Optional[str] = "openrouter/google/gemini-2.0-flash-lite-001",
+ project: str = "deep-research",
+ tags: Optional[List[str]] = None,
+) -> Optional[str]:
+ """Find the most relevant string from a list of options using simple text matching.
+
+ If model is provided, uses litellm to determine relevance.
+
+ Args:
+ target: The target string to find relevance for
+ options: List of string options to check against
+ model: Model to use for matching (with provider prefix)
+ project: Langfuse project name for LLM tracking
+ tags: Optional list of tags for Langfuse tracking. Defaults to ["find_most_relevant_string"] if None.
+
+ Returns:
+ The most relevant string, or None if no relevant options
+ """
+ if not options:
+ return None
+
+ if len(options) == 1:
+ return options[0]
+
+ # If model is provided, use litellm for more accurate matching
+ if model:
+ try:
+ # Ensure model name has provider prefix
+ if not any(
+ model.startswith(prefix + "/")
+ for prefix in [
+ "sambanova",
+ "openai",
+ "anthropic",
+ "meta",
+ "google",
+ "aws",
+ "openrouter",
+ ]
+ ):
+ # Raise an error if no provider prefix is specified
+ error_msg = f"Model '{model}' does not have a provider prefix. Please specify provider (e.g., 'sambanova/{model}')"
+ logger.error(error_msg)
+ raise ValueError(error_msg)
+
+ system_prompt = "You are a research assistant."
+ prompt = f"""Given the text: "{target}"
+Which of the following options is most relevant to this text?
+{options}
+
+Respond with only the exact text of the most relevant option."""
+
+ # Get pipeline run name and id for trace_name and trace_id if running in a step
+ trace_name = None
+ trace_id = None
+ try:
+ context = get_step_context()
+ trace_name = context.pipeline_run.name
+ trace_id = str(context.pipeline_run.id)
+ except RuntimeError:
+ # Not running in a step context
+ pass
+
+ # Use provided tags or default to ["find_most_relevant_string"]
+ if tags is None:
+ tags = ["find_most_relevant_string"]
+
+ # Build metadata dict
+ metadata = {"project": project, "tags": tags}
+ # Convert tags to trace_metadata format
+ metadata["trace_metadata"] = {tag: True for tag in tags}
+ if trace_name:
+ metadata["trace_name"] = trace_name
+ if trace_id:
+ metadata["trace_id"] = trace_id
+
+ response = completion(
+ model=model,
+ messages=[
+ {"role": "system", "content": system_prompt},
+ {"role": "user", "content": prompt},
+ ],
+ max_tokens=100,
+ temperature=0.2,
+ metadata=metadata,
+ )
+
+ answer = response.choices[0].message.content.strip()
+
+ # Check if the answer is one of the options
+ if answer in options:
+ return answer
+
+ # If not an exact match, find the closest one
+ for option in options:
+ if option in answer or answer in option:
+ return option
+
+ except Exception as e:
+ logger.error(f"Error finding relevant string with LLM: {e}")
+
+ # Simple relevance check - find exact matches first
+ for option in options:
+ if target.lower() == option.lower():
+ return option
+
+ # Then check partial matches
+ for option in options:
+ if is_text_relevant(target, option):
+ return option
+
+ # Return the first option as a fallback
+ return options[0]
+
+
+def synthesize_information(
+ synthesis_input: Dict[str, Any],
+ model: str = "openrouter/google/gemini-2.0-flash-lite-001",
+ system_prompt: Optional[str] = None,
+ project: str = "deep-research",
+ tags: Optional[List[str]] = None,
+) -> Dict[str, Any]:
+ """Synthesize information from search results for a sub-question.
+
+ Uses litellm for model inference.
+
+ Args:
+ synthesis_input: Dictionary with sub-question, search results, and sources
+ model: Model to use (with provider prefix)
+ system_prompt: System prompt for the LLM
+ project: Langfuse project name for LLM tracking
+ tags: Optional list of tags for Langfuse tracking. Defaults to ["information_synthesis"] if None.
+
+ Returns:
+ Dictionary with synthesized information
+ """
+ if system_prompt is None:
+ system_prompt = SYNTHESIS_PROMPT
+
+ sub_question_for_log = synthesis_input.get(
+ "sub_question", "unknown question"
+ )
+
+ # Define the fallback response
+ fallback_response = {
+ "synthesized_answer": f"Synthesis failed for '{sub_question_for_log}'.",
+ "key_sources": synthesis_input.get("sources", [])[:1],
+ "confidence_level": "low",
+ "information_gaps": "An error occurred during the synthesis process.",
+ }
+
+ # Use provided tags or default to ["information_synthesis"]
+ if tags is None:
+ tags = ["information_synthesis"]
+
+ # Use the utility function to get structured output
+ result = get_structured_llm_output(
+ prompt=json.dumps(synthesis_input),
+ system_prompt=system_prompt,
+ model=model,
+ fallback_response=fallback_response,
+ max_tokens=3000, # Increased for more detailed synthesis
+ project=project,
+ tags=tags,
+ )
+
+ return result
diff --git a/deep_research/utils/prompt_models.py b/deep_research/utils/prompt_models.py
new file mode 100644
index 00000000..00e08157
--- /dev/null
+++ b/deep_research/utils/prompt_models.py
@@ -0,0 +1,27 @@
+"""Pydantic models for prompt tracking and management.
+
+This module contains models for tracking prompts as artifacts
+in the ZenML pipeline, enabling better observability and version control.
+"""
+
+from pydantic import BaseModel, Field
+
+
+class PromptTemplate(BaseModel):
+ """Represents a single prompt template with metadata."""
+
+ name: str = Field(..., description="Unique identifier for the prompt")
+ content: str = Field(..., description="The actual prompt template content")
+ description: str = Field(
+ "", description="Human-readable description of what this prompt does"
+ )
+ version: str = Field("1.0.0", description="Version of the prompt template")
+ tags: list[str] = Field(
+ default_factory=list, description="Tags for categorizing prompts"
+ )
+
+ model_config = {
+ "extra": "ignore",
+ "frozen": False,
+ "validate_assignment": True,
+ }
diff --git a/deep_research/utils/prompts.py b/deep_research/utils/prompts.py
new file mode 100644
index 00000000..ced42f6f
--- /dev/null
+++ b/deep_research/utils/prompts.py
@@ -0,0 +1,1435 @@
+"""
+Centralized collection of prompts used throughout the deep research pipeline.
+
+This module contains all system prompts used by LLM calls in various steps of the
+research pipeline to ensure consistency and make prompt management easier.
+"""
+
+# Search query generation prompt
+# Used to generate effective search queries from sub-questions
+DEFAULT_SEARCH_QUERY_PROMPT = """
+You are a Deep Research assistant. Given a specific research sub-question, your task is to formulate an effective search
+query that will help find relevant information to answer the question.
+
+A good search query should:
+1. Extract the key concepts from the sub-question
+2. Use precise, specific terminology
+3. Exclude unnecessary words or context
+4. Include alternative terms or synonyms when helpful
+5. Be concise yet comprehensive enough to find relevant results
+
+Format the output in json with the following json schema definition:
+
+
+
+Make sure that the output is a json object with an output json schema defined above.
+Only return the json object, no explanation or additional text.
+"""
+
+# Query decomposition prompt
+# Used to break down complex research queries into specific sub-questions
+QUERY_DECOMPOSITION_PROMPT = """
+You are a Deep Research assistant specializing in research design. You will be given a MAIN RESEARCH QUERY that needs to be explored comprehensively. Your task is to create diverse, insightful sub-questions that explore different dimensions of the topic.
+
+IMPORTANT: The main query should be interpreted as a single research question, not as a noun phrase. For example:
+- If the query is "Is LLMOps a subset of MLOps?", create questions ABOUT LLMOps and MLOps, not questions like "What is 'Is LLMOps a subset of MLOps?'"
+- Focus on the concepts, relationships, and implications within the query
+
+Create sub-questions that explore these DIFFERENT DIMENSIONS:
+
+1. **Definitional/Conceptual**: Define key terms and establish conceptual boundaries
+ Example: "What are the core components and characteristics of LLMOps?"
+
+2. **Comparative/Relational**: Compare and contrast the concepts mentioned
+ Example: "How do the workflows and tooling of LLMOps differ from traditional MLOps?"
+
+3. **Historical/Evolutionary**: Trace development and emergence
+ Example: "How did LLMOps emerge from MLOps practices?"
+
+4. **Structural/Technical**: Examine technical architecture and implementation
+ Example: "What specific tools and platforms are unique to LLMOps?"
+
+5. **Practical/Use Cases**: Explore real-world applications
+ Example: "What are the key use cases that require LLMOps but not traditional MLOps?"
+
+6. **Stakeholder/Industry**: Consider different perspectives and adoption
+ Example: "How are different industries adopting LLMOps vs MLOps?"
+
+7. **Challenges/Limitations**: Identify problems and constraints
+ Example: "What unique challenges does LLMOps face that MLOps doesn't?"
+
+8. **Future/Trends**: Look at emerging developments
+ Example: "How is the relationship between LLMOps and MLOps expected to evolve?"
+
+QUALITY GUIDELINES:
+- Each sub-question must explore a DIFFERENT dimension - no repetitive variations
+- Questions should be specific, concrete, and investigable
+- Mix descriptive ("what/who") with analytical ("why/how") questions
+- Ensure questions build toward answering the main query comprehensively
+- Frame questions to elicit detailed, nuanced responses
+- Consider technical, business, organizational, and strategic aspects
+
+Format the output in json with the following json schema definition:
+
+
+
+Make sure that the output is a json object with an output json schema defined above.
+Only return the json object, no explanation or additional text.
+"""
+
+# Synthesis prompt for individual sub-questions
+# Used to synthesize search results into comprehensive answers for sub-questions
+SYNTHESIS_PROMPT = """
+You are a Deep Research assistant specializing in information synthesis. Given a sub-question and search results, your task is to synthesize the information
+into a comprehensive, accurate, and well-structured answer.
+
+Your synthesis should:
+1. Begin with a direct, concise answer to the sub-question in the first paragraph
+2. Provide detailed evidence and explanation in subsequent paragraphs (at least 3-5 paragraphs total)
+3. Integrate information from multiple sources, citing them within your answer
+4. Acknowledge any conflicting information or contrasting viewpoints you encounter
+5. Use data, statistics, examples, and quotations when available to strengthen your answer
+6. Organize information logically with a clear flow between concepts
+7. Identify key sources that provided the most valuable information (at least 2-3 sources)
+8. Explicitly acknowledge information gaps where the search results were incomplete
+9. Write in plain text format - do NOT use markdown formatting, bullet points, or special characters
+
+Confidence level criteria:
+- HIGH: Multiple high-quality sources provide consistent information, comprehensive coverage of the topic, and few information gaps
+- MEDIUM: Decent sources with some consistency, but notable information gaps or some conflicting information
+- LOW: Limited sources, major information gaps, significant contradictions, or only tangentially relevant information
+
+Information gaps should specifically identify:
+1. Aspects of the question that weren't addressed in the search results
+2. Areas where more detailed or up-to-date information would be valuable
+3. Perspectives or data sources that would complement the existing information
+
+Format the output in json with the following json schema definition:
+
+
+
+Make sure that the output is a json object with an output json schema defined above.
+Only return the json object, no explanation or additional text.
+"""
+
+# Viewpoint analysis prompt for cross-perspective examination
+# Used to analyze synthesized answers across different perspectives and viewpoints
+VIEWPOINT_ANALYSIS_PROMPT = """
+You are a Deep Research assistant specializing in multi-perspective analysis. You will be given a set of synthesized answers
+to sub-questions related to a main research query. Your task is to perform a thorough, nuanced analysis of how different
+perspectives would interpret this information.
+
+Think deeply about the following viewpoint categories and how they would approach the information differently:
+- Scientific: Evidence-based, empirical approach focused on data, research findings, and methodological rigor
+- Political: Power dynamics, governance structures, policy implications, and ideological frameworks
+- Economic: Resource allocation, financial impacts, market dynamics, and incentive structures
+- Social: Cultural norms, community impacts, group dynamics, and public welfare
+- Ethical: Moral principles, values considerations, rights and responsibilities, and normative judgments
+- Historical: Long-term patterns, precedents, contextual development, and evolutionary change
+
+For each synthesized answer, analyze how these different perspectives would interpret the information by:
+
+1. Identifying 5-8 main points of agreement where multiple perspectives align (with specific examples)
+2. Analyzing at least 3-5 areas of tension between perspectives with:
+ - A clear topic title for each tension point
+ - Contrasting interpretations from at least 2-3 different viewpoint categories per tension
+ - Specific examples or evidence showing why these perspectives differ
+ - The nuanced positions of each perspective, not just simplified oppositions
+
+3. Thoroughly examining perspective gaps by identifying:
+ - Which perspectives are underrepresented or missing in the current research
+ - How including these missing perspectives would enrich understanding
+ - Specific questions or dimensions that remain unexplored
+ - Write in plain text format - do NOT use markdown formatting, bullet points, or special characters
+
+4. Developing integrative insights that:
+ - Synthesize across multiple perspectives to form a more complete understanding
+ - Highlight how seemingly contradictory viewpoints can complement each other
+ - Suggest frameworks for reconciling tensions or finding middle-ground approaches
+ - Identify actionable takeaways that incorporate multiple perspectives
+ - Write in plain text format - do NOT use markdown formatting, bullet points, or special characters
+
+Format the output in json with the following json schema definition:
+
+
+
+Make sure that the output is a json object with an output json schema defined above.
+Only return the json object, no explanation or additional text.
+"""
+
+# Reflection prompt for self-critique and improvement
+# Used to evaluate the research and identify gaps, biases, and areas for improvement
+REFLECTION_PROMPT = """
+You are a Deep Research assistant with the ability to critique and improve your own research. You will be given:
+1. The main research query
+2. The sub-questions explored so far
+3. The synthesized information for each sub-question
+4. Any viewpoint analysis performed
+
+Your task is to critically evaluate this research and identify:
+1. Areas where the research is incomplete or has gaps
+2. Questions that are important but not yet answered
+3. Aspects where additional evidence or depth would significantly improve the research
+4. Potential biases or limitations in the current findings
+
+Be constructively critical and identify the most important improvements that would substantially enhance the research.
+
+Format the output in json with the following json schema definition:
+
+
+
+Make sure that the output is a json object with an output json schema defined above.
+Only return the json object, no explanation or additional text.
+"""
+
+# Additional synthesis prompt for incorporating new information
+# Used to enhance original synthesis with new information and address critique points
+ADDITIONAL_SYNTHESIS_PROMPT = """
+You are a Deep Research assistant. You will be given:
+1. The original synthesized information on a research topic
+2. New information from additional research
+3. A critique of the original synthesis
+
+Your task is to enhance the original synthesis by incorporating the new information and addressing the critique.
+The updated synthesis should:
+1. Integrate new information seamlessly
+2. Address gaps identified in the critique
+3. Maintain a balanced, comprehensive, and accurate representation
+4. Preserve the strengths of the original synthesis
+5. Write in plain text format - do NOT use markdown formatting, bullet points, or special characters
+
+Format the output in json with the following json schema definition:
+
+
+
+Make sure that the output is a json object with an output json schema defined above.
+Only return the json object, no explanation or additional text.
+"""
+
+# Final report generation prompt
+# Used to compile a comprehensive HTML research report from all synthesized information
+REPORT_GENERATION_PROMPT = """
+You are a Deep Research assistant responsible for compiling an in-depth, comprehensive research report. You will be given:
+1. The original research query
+2. The sub-questions that were explored
+3. Synthesized information for each sub-question
+4. Viewpoint analysis comparing different perspectives (if available)
+5. Reflection metadata highlighting improvements and limitations
+
+Your task is to create a well-structured, coherent, professional-quality research report with the following features:
+
+EXECUTIVE SUMMARY (250-400 words):
+- Begin with a compelling, substantive executive summary that provides genuine insight
+- Highlight 3-5 key findings or insights that represent the most important discoveries
+- Include brief mention of methodology and limitations
+- Make the summary self-contained so it can be read independently of the full report
+- End with 1-2 sentences on broader implications or applications of the research
+
+INTRODUCTION (200-300 words):
+- Provide relevant background context on the main research query
+- Explain why this topic is significant or worth investigating
+- Outline the methodological approach used (sub-questions, search strategy, synthesis)
+- Preview the overall structure of the report
+
+SUB-QUESTION SECTIONS:
+- For each sub-question, create a dedicated section with:
+ * A descriptive section title (not just repeating the sub-question)
+ * A brief (1 paragraph) overview of key findings for this sub-question
+ * A "Key Findings" box highlighting 3-4 important discoveries for scannable reading
+ * The detailed, synthesized answer with appropriate paragraph breaks, lists, and formatting
+ * Proper citation of sources within the text (e.g., "According to [Source Name]...")
+ * Clear confidence indicator with appropriate styling
+ * Information gaps clearly identified in their own subsection
+ * Complete list of key sources used
+
+VIEWPOINT ANALYSIS SECTION (if available):
+- Create a detailed section that:
+ * Explains the purpose and value of multi-perspective analysis
+ * Presents points of agreement as actionable insights, not just observations
+ * Structures tension areas with clear topic headings and balanced presentation of viewpoints
+ * Uses visual elements (different background colors, icons) to distinguish different perspectives
+ * Integrates perspective gaps and insights into a cohesive narrative
+
+CONCLUSION (300-400 words):
+- Synthesize the overall findings, not just summarizing each section
+- Connect insights from different sub-questions to form higher-level understanding
+- Address the main research query directly with evidence-based conclusions
+- Acknowledge remaining uncertainties and suggestions for further research
+- End with implications or applications of the research findings
+
+OVERALL QUALITY REQUIREMENTS:
+1. Create visually scannable content with clear headings, bullet points, and short paragraphs
+2. Use semantic HTML (h1, h2, h3, p, blockquote, etc.) to create proper document structure
+3. Include a comprehensive table of contents with anchor links to all major sections
+4. Format all sources consistently in the references section with proper linking when available
+5. Use tables, lists, and blockquotes to improve readability and highlight important information
+6. Apply appropriate styling for different confidence levels (high, medium, low)
+7. Ensure proper HTML nesting and structure throughout the document
+8. Balance sufficient detail with clarity and conciseness
+9. Make all text directly actionable and insight-driven, not just descriptive
+
+The report should be formatted in HTML with appropriate headings, paragraphs, citations, and formatting.
+Use semantic HTML (h1, h2, h3, p, blockquote, etc.) to create a structured document.
+Include a table of contents at the beginning with anchor links to each section.
+For citations, use a consistent format and collect them in a references section at the end.
+
+Include this exact CSS stylesheet in your HTML to ensure consistent styling (do not modify it):
+
+```css
+
+```
+
+The HTML structure should follow this pattern:
+
+```html
+
+
+
+
+
+ [CSS STYLESHEET GOES HERE]
+
+
+
+
+
+```
+
+Special instructions:
+1. For each sub-question, display the confidence level with appropriate styling (confidence-high, confidence-medium, or confidence-low)
+2. Extract 2-3 key findings from each answer to create the key-findings box
+3. Format all sources consistently in the references section
+4. Use tables, lists, and blockquotes where appropriate to improve readability
+5. Use the notice classes (info, warning) to highlight important information or limitations
+6. Ensure all sections have proper ID attributes for the table of contents links
+
+Return only the complete HTML code for the report, with no explanations or additional text.
+"""
+
+
+# Executive Summary generation prompt
+# Used to create a compelling, insight-driven executive summary
+EXECUTIVE_SUMMARY_GENERATION_PROMPT = """
+You are a Deep Research assistant specializing in creating executive summaries. Given comprehensive research findings, your task is to create a compelling executive summary that captures the essence of the research and its key insights.
+
+Your executive summary should:
+
+1. **Opening Statement (1-2 sentences):**
+ - Start with a powerful, direct answer to the main research question
+ - Make it clear and definitive based on the evidence gathered
+
+2. **Key Findings (3-5 bullet points):**
+ - Extract the MOST IMPORTANT discoveries from across all sub-questions
+ - Focus on insights that are surprising, actionable, or paradigm-shifting
+ - Each finding should be specific and evidence-based, not generic
+ - Prioritize findings that directly address the main query
+
+3. **Critical Insights (2-3 sentences):**
+ - Synthesize patterns or themes that emerged across multiple sub-questions
+ - Highlight any unexpected discoveries or counter-intuitive findings
+ - Connect disparate findings to reveal higher-level understanding
+
+4. **Implications (2-3 sentences):**
+ - What do these findings mean for practitioners/stakeholders?
+ - What actions or decisions can be made based on this research?
+ - Why should the reader care about these findings?
+
+5. **Confidence and Limitations (1-2 sentences):**
+ - Briefly acknowledge the overall confidence level of the findings
+ - Note any significant gaps or areas requiring further investigation
+
+IMPORTANT GUIDELINES:
+- Be CONCISE but INSIGHTFUL - every sentence should add value
+- Use active voice and strong, definitive language where evidence supports it
+- Avoid generic statements - be specific to the actual research findings
+- Lead with the most important information
+- Make it self-contained - reader should understand key findings without reading the full report
+- Target length: 250-400 words
+
+Format as well-structured HTML paragraphs using
tags and
/
for bullet points.
+"""
+
+# Introduction generation prompt
+# Used to create a contextual, engaging introduction
+INTRODUCTION_GENERATION_PROMPT = """
+You are a Deep Research assistant specializing in creating engaging introductions. Given a research query and the sub-questions explored, your task is to create an introduction that provides context and sets up the reader's expectations.
+
+Your introduction should:
+
+1. **Context and Relevance (2-3 sentences):**
+ - Why is this research question important NOW?
+ - What makes this topic significant or worth investigating?
+ - Connect to current trends, debates, or challenges in the field
+
+2. **Scope and Approach (2-3 sentences):**
+ - What specific aspects of the topic does this research explore?
+ - Briefly mention the key dimensions covered (based on sub-questions)
+ - Explain the systematic approach without being too technical
+
+3. **What to Expect (2-3 sentences):**
+ - Preview the structure of the report
+ - Hint at some of the interesting findings or tensions discovered
+ - Set expectations about the depth and breadth of analysis
+
+IMPORTANT GUIDELINES:
+- Make it engaging - hook the reader's interest from the start
+- Provide real context, not generic statements
+- Connect to why this matters for the reader
+- Keep it concise but informative (200-300 words)
+- Use active voice and clear language
+- Build anticipation for the findings without giving everything away
+
+Format as well-structured HTML paragraphs using
tags. Do NOT include any headings or section titles.
+"""
+
+# Conclusion generation prompt
+# Used to synthesize all research findings into a comprehensive conclusion
+CONCLUSION_GENERATION_PROMPT = """
+You are a Deep Research assistant specializing in synthesizing comprehensive research conclusions. Given all the research findings from a deep research study, your task is to create a thoughtful, evidence-based conclusion that ties together the overall findings.
+
+Your conclusion should:
+
+1. **Synthesis and Integration (150-200 words):**
+ - Connect insights from different sub-questions to form a higher-level understanding
+ - Identify overarching themes and patterns that emerge from the research
+ - Highlight how different findings relate to and support each other
+ - Avoid simply summarizing each section separately
+
+2. **Direct Response to Main Query (100-150 words):**
+ - Address the original research question directly with evidence-based conclusions
+ - State what the research definitively established vs. what remains uncertain
+ - Provide a clear, actionable answer based on the synthesized evidence
+
+3. **Limitations and Future Directions (100-120 words):**
+ - Acknowledge remaining uncertainties and information gaps across all sections
+ - Suggest specific areas where additional research would be most valuable
+ - Identify what types of evidence or perspectives would strengthen the findings
+
+4. **Implications and Applications (80-100 words):**
+ - Explain the practical significance of the research findings
+ - Suggest how the insights might be applied or what they mean for stakeholders
+ - Connect findings to broader contexts or implications
+
+Format your output as a well-structured conclusion section in HTML format with appropriate paragraph breaks and formatting. Use
tags for paragraphs and organize the content logically with clear transitions between the different aspects outlined above.
+
+IMPORTANT: Do NOT include any headings like "Conclusion",
, or
tags - the section already has a heading. Start directly with the conclusion content in paragraph form. Just create flowing, well-structured paragraphs that cover all four aspects naturally.
+
+Ensure the conclusion feels cohesive and draws meaningful connections between findings rather than just listing them sequentially.
+"""
+
+# Static HTML template for direct report generation without LLM
+STATIC_HTML_TEMPLATE = """
+
+
+
+
+ Research Report: {main_query}
+
+
+
+ {shared_css}
+
+
+
+