Skip to content

Release v0.2.0

Latest

Choose a tag to compare

@github-actions github-actions released this 28 Mar 00:06
· 1 commit to main since this release
e030b91

Highlights

  • Pluggable fetchers for GitHub, Wikipedia, YouTube, ArXiv, StackOverflow, HackerNews, RSS, package registries, docs sites, and Twitter
  • Batch fetching for concurrent multi-URL requests
  • Content-focused extraction with boilerplate stripping and structured metadata
  • Conditional fetching with ETag and If-Modified-Since support
  • Improved HTML-to-Markdown conversion quality
  • Content quality signals: word count, redirect chain, paywall detection
  • Optional Web Bot Authentication support
  • Hardened outbound fetch policy with proxy isolation and SSRF mitigations
  • Live integration test suite behind feature flag

Breaking Changes

  • Ambient proxy environment variables are now ignored by default; set them explicitly if needed

What's Changed

  • test(fetchers): add live integration tests behind feature flag (#84)
  • chore: periodic maintenance — deps update and spec sync (#83)
  • feat(fetch): add content quality signals (word_count, redirect_chain, is_paywall) (#82)
  • feat(client): add batch_fetch for concurrent multi-URL fetching (#81)
  • feat(fetch): add conditional fetching with ETag and If-Modified-Since (#80)
  • feat(convert): improve HTML-to-Markdown conversion quality (#79)
  • feat(convert): add content-focused extraction with boilerplate stripping (#78)
  • feat(convert): add structured metadata extraction from HTML pages (#77)
  • feat(fetchers): add RSSFeedFetcher for structured feed parsing (#70)
  • feat(fetchers): add HackerNewsFetcher for structured thread extraction (#69)
  • feat(fetchers): add ArXivFetcher for paper metadata and abstract (#68)
  • feat(fetchers): add YouTubeFetcher for video metadata extraction (#67)
  • feat(fetchers): add WikipediaFetcher for article extraction (#66)
  • feat(fetchers): add PackageRegistryFetcher for PyPI, crates.io, npm (#65)
  • feat(fetchers): add StackOverflowFetcher for clean Q&A extraction (#64)
  • feat(fetchers): add DocsSiteFetcher with llms.txt support (#63)
  • feat(fetchers): add GitHubCodeFetcher for source file fetching (#62)
  • feat(fetchers): add GitHubIssueFetcher for structured issue/PR fetching (#61)
  • feat: add process-issues skill for e2e GitHub issue resolution (#60)
  • feat: add optional Web Bot Authentication support (#49)
  • feat(fetchers): add TwitterFetcher for tweet URL handling (#47)
  • feat: skip HTML conversion for non-HTML responses (#48)
  • chore(deps): update workspace dependencies and fix flaky proxy tests (#46)
  • feat(toolkit): align fetchkit with toolkit library contract (#45)
  • fix(security): harden outbound fetch policy (#43)
  • docs: clarify latest-main requirement for worktrees (#44)
  • fix(security): isolate proxy env in shared runtimes (#42)
  • fix(security): block IPv4-compatible and 6to4 IPv6 addresses in SSRF protection (#41)
  • fix(security): sanitize reqwest error messages to prevent hostname leakage (#40)
  • fix: resolve threat model issues (#37)

Full Changelog: v0.1.3...v0.2.0