-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Description
What
Add a WikipediaFetcher that matches en.wikipedia.org/wiki/{title} URLs (and other language editions), returning clean article content via the MediaWiki API.
Why
Agents doing research and fact-checking frequently land on Wikipedia. The current DefaultFetcher returns the full page with edit links, references sections, navigation boxes, and other wiki-specific chrome. The MediaWiki API provides clean extract text and structured metadata.
Requirements
- Match:
https://{lang}.wikipedia.org/wiki/{title}(all language editions) - Fetch via API:
https://{lang}.wikipedia.org/api/rest_v1/page/summary/{title}for summary - Optionally fetch full content via:
https://{lang}.wikipedia.org/api/rest_v1/page/html/{title} - Return: title, extract/summary, infobox data (if parseable), key sections, categories
- Strip: edit links, reference numbers, navigation boxes, disambiguation notices
- Format field:
"wikipedia" - Support redirect resolution
Design Notes
- MediaWiki REST API is well-documented and has generous rate limits
- Summary endpoint returns a concise extract — often sufficient for agent needs
- Full HTML endpoint can be converted via existing
html_to_markdownwith wiki-specific cleanup - Infobox extraction is complex — could be a stretch goal
Tier
2 — High-frequency agent need
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request