Skip to content

add web_scraper.py#44

Merged
moonpyt merged 3 commits intoXSpoonAi:mainfrom
helloissariel:fix/api-call
Dec 4, 2025
Merged

add web_scraper.py#44
moonpyt merged 3 commits intoXSpoonAi:mainfrom
helloissariel:fix/api-call

Conversation

@helloissariel
Copy link
Copy Markdown
Contributor

Summary

This PR introduces a new, reusable WebScraperTool to the spoon-toolkit library.

Previously, examples like x402_agent_demo.py used ad-hoc, primitive HTTP getters (HttpProbeTool) that returned raw HTML. This new tool provides a robust, production-ready solution designed specifically for LLM agents, featuring HTML-to-Markdown conversion, content cleaning, and x402 protocol compatibility.

Key Features

  • LLM-Friendly Output: Uses beautifulsoup4 and markdownify to convert raw HTML into clean Markdown, significantly reducing token usage and noise.
  • Smart Cleaning: Automatically strips <script>, <style>, <iframe>, and heuristic-based ad elements.
  • x402 Native Support: Gracefully handles 402 Payment Required responses by returning the payment headers (instead of raising exceptions), enabling agents to detect and pay for content seamlessly.
  • Safety Mechanisms: Includes a "Smart Mode" to truncate oversized pages (~100k tokens) to prevent context window overflow.
  • Async I/O: Built on httpx for non-blocking performance.

Changes

  • New Tool: Added spoon_toolkits/web/web_scraper.py.
  • Exports: Exposed via spoon_toolkits/web/__init__.py.
  • Dependencies: Added beautifulsoup4 and markdownify to pyproject.toml / requirements.txt.
  • Refactor (Optional): Replaced the ad-hoc HttpProbeTool in examples/x402_agent_demo.py with this new standard tool.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

# Crypto PowerData dependencies
ccxt>=4.0.0
numpy>=1.20.0
TA-Lib>=0.4.25
asyncio-throttle>=1.0.0

P1 Badge Restore neo3 dependency for Neo tools

Installing from requirements.txt now skips neo3>=1.0.0 (it was removed in this change), but the Neo modules still import neo3.api/neo3.core (spoon_toolkits/crypto/neo/neo_provider.py lines 22–25). Any environment built from this requirements file will raise ModuleNotFoundError: neo3 as soon as a Neo tool is imported, so the dependency needs to be re-listed.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@moonpyt moonpyt merged commit 9855a98 into XSpoonAi:main Dec 4, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants