feat: LLM-friendly output types with pagination#24
Open
mareurs wants to merge 10 commits intoassafelovic:masterfrom
Open
feat: LLM-friendly output types with pagination#24mareurs wants to merge 10 commits intoassafelovic:masterfrom
mareurs wants to merge 10 commits intoassafelovic:masterfrom
Conversation
…driver - deep_research tool: use report_type="deep" for recursive multi-level research with configurable breadth, depth, and concurrency parameters - Override SCRAPER to "bs" during deep_research to avoid nodriver browser pool deadlocks from concurrent sub-researchers (restore original in finally block) - Dockerfile: install Chromium for nodriver scraper (used by quick_search) - requirements.txt: add zendriver and ddgs dependencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add parse_report_sections() to split markdown reports by ## headers and chunk_context() to group research snippets by word count limit. Both return structured dicts with index, content, and word_count for downstream pagination support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add output_type parameter (summary, briefing, report, deep_report, raw_context) to deep_research tool. Compact types (summary, briefing) return full content inline. Paginated types (report, deep_report, raw_context) return a table of contents and first section, with remaining sections accessible via get_report_section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the generic research prompt with one that teaches the consuming LLM about the five output types (summary, briefing, report, deep_report, raw_context), includes a decision guide for choosing between them, and documents the pagination workflow for larger report types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Save/restore TOTAL_WORDS around apply_preset calls in both deep_research and write_report to prevent stale env values across subsequent calls - Use explicit `x if x else ""` instead of `x or ""` for custom_prompt - Add sources field to both compact and paginated responses in write_report for consistency with deep_research responses Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reduce assertion from <= 1200 to <= 1000 to match the actual max_words parameter, catching regressions more precisely.
Supports "raw" (default, search snippets) and "summary" (LLM-synthesized). Uses GPT-Researcher's existing aggregated_summary support. Addresses assafelovic/gpt-researcher#1603. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an
output_typeparameter todeep_research,quick_search, andwrite_reporttools, letting consuming LLMs choose the right output verbosity for their task. Also adds section-based pagination for large reports via a newget_report_sectiontool.summary(~200 words, bullet points),briefing(~600 words, executive style),report(~1200 words, paginated),deep_report(~2500 words, paginated),raw_context(paginated research snippets)##headers, only TOC + first section returned initially, LLM fetches more sections on demandwrite_reportcan re-render an existing research session in a different format without re-researchingraw(default, returns search snippets) orsummary(uses GPT-Researcher'saggregated_summary)Problem
When an LLM calls
deep_research, the full report (often 2000-5000+ words) gets dumped into the LLM's context window. For many use cases (answering a quick question, comparing options), a 200-word summary would suffice. There was no way for the consuming LLM to control output verbosity.Solution
The LLM now specifies
output_typeat query time. The MCP system prompt teaches it when to use each type:summarybriefingreportdeep_reportraw_contextNew files
presets.pytests/test_pagination.pyChanged files
server.pyoutput_typeparam on 3 tools, newget_report_sectiontool, pagination logic, env var save/restoreutils.pyparse_report_sections(),chunk_context(), updatedcreate_research_prompt()with output type guidanceDockerfilerequirements.txtTest plan
pytest tests/test_pagination.py— 9 tests passdeep_research(query, output_type="summary")→ ~200 word bullet pointsdeep_research(query, output_type="briefing")→ ~600 word executive briefingwrite_report(research_id, output_type="report")→ paginated TOC + sectionsget_report_section(research_id, section=3)→ fetches individual sectionquick_search(query, output_type="summary")→ aggregated summaryNote on presets config
The presets file (
presets.py) can load from a volume-mounted config at/app/config/output_presets.py, making prompt tuning possible without rebuilding the container. Falls back to sensible built-in defaults if no config is mounted.🤖 Generated with Claude Code