Skip to content

feat: LLM-friendly output types with pagination#24

Open
mareurs wants to merge 10 commits intoassafelovic:masterfrom
mareurs:feat/llm-friendly-output-types
Open

feat: LLM-friendly output types with pagination#24
mareurs wants to merge 10 commits intoassafelovic:masterfrom
mareurs:feat/llm-friendly-output-types

Conversation

@mareurs
Copy link
Copy Markdown

@mareurs mareurs commented Feb 26, 2026

Summary

Adds an output_type parameter to deep_research, quick_search, and write_report tools, letting consuming LLMs choose the right output verbosity for their task. Also adds section-based pagination for large reports via a new get_report_section tool.

  • 5 output types: summary (~200 words, bullet points), briefing (~600 words, executive style), report (~1200 words, paginated), deep_report (~2500 words, paginated), raw_context (paginated research snippets)
  • Section-based pagination: Reports split by ## headers, only TOC + first section returned initially, LLM fetches more sections on demand
  • "Research once, render many": write_report can re-render an existing research session in a different format without re-researching
  • quick_search output_type: raw (default, returns search snippets) or summary (uses GPT-Researcher's aggregated_summary)
  • Configurable presets: Output type prompts and word limits loaded from a config file (volume-mountable)

Problem

When an LLM calls deep_research, the full report (often 2000-5000+ words) gets dumped into the LLM's context window. For many use cases (answering a quick question, comparing options), a 200-word summary would suffice. There was no way for the consuming LLM to control output verbosity.

Solution

The LLM now specifies output_type at query time. The MCP system prompt teaches it when to use each type:

Type Words When to use
summary ~200 Quick factual questions, comparisons
briefing ~600 Decision support, overviews
report ~1200 Detailed analysis (paginated)
deep_report ~2500 Comprehensive coverage (paginated)
raw_context variable When LLM wants to synthesize itself

New files

File Purpose
presets.py Loads output type presets (prompt + word limits), validates types
tests/test_pagination.py 9 tests for section parsing and context chunking

Changed files

File Changes
server.py output_type param on 3 tools, new get_report_section tool, pagination logic, env var save/restore
utils.py parse_report_sections(), chunk_context(), updated create_research_prompt() with output type guidance
Dockerfile Added Chromium for nodriver scraper
requirements.txt Added nodriver, beautifulsoup4

Test plan

  • pytest tests/test_pagination.py — 9 tests pass
  • Container builds successfully
  • All 6 MCP tools register with correct schemas
  • deep_research(query, output_type="summary") → ~200 word bullet points
  • deep_research(query, output_type="briefing") → ~600 word executive briefing
  • write_report(research_id, output_type="report") → paginated TOC + sections
  • get_report_section(research_id, section=3) → fetches individual section
  • quick_search(query, output_type="summary") → aggregated summary
  • TOTAL_WORDS env var saved/restored between requests (no leaking)

Note on presets config

The presets file (presets.py) can load from a volume-mounted config at /app/config/output_presets.py, making prompt tuning possible without rebuilding the container. Falls back to sensible built-in defaults if no config is mounted.

🤖 Generated with Claude Code

mareurs and others added 10 commits February 20, 2026 11:33
…driver

- deep_research tool: use report_type="deep" for recursive multi-level research
  with configurable breadth, depth, and concurrency parameters
- Override SCRAPER to "bs" during deep_research to avoid nodriver browser pool
  deadlocks from concurrent sub-researchers (restore original in finally block)
- Dockerfile: install Chromium for nodriver scraper (used by quick_search)
- requirements.txt: add zendriver and ddgs dependencies

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add parse_report_sections() to split markdown reports by ## headers
and chunk_context() to group research snippets by word count limit.
Both return structured dicts with index, content, and word_count
for downstream pagination support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add output_type parameter (summary, briefing, report, deep_report,
raw_context) to deep_research tool. Compact types (summary, briefing)
return full content inline. Paginated types (report, deep_report,
raw_context) return a table of contents and first section, with
remaining sections accessible via get_report_section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the generic research prompt with one that teaches the consuming
LLM about the five output types (summary, briefing, report, deep_report,
raw_context), includes a decision guide for choosing between them, and
documents the pagination workflow for larger report types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Save/restore TOTAL_WORDS around apply_preset calls in both deep_research
  and write_report to prevent stale env values across subsequent calls
- Use explicit `x if x else ""` instead of `x or ""` for custom_prompt
- Add sources field to both compact and paginated responses in write_report
  for consistency with deep_research responses

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduce assertion from <= 1200 to <= 1000 to match the actual
max_words parameter, catching regressions more precisely.
Supports "raw" (default, search snippets) and "summary" (LLM-synthesized).
Uses GPT-Researcher's existing aggregated_summary support.
Addresses assafelovic/gpt-researcher#1603.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant