🕷️ Chuscraper

Stealth-focused Web & Mobile automation framework powered by CDP and ADB
You Only Scrape Once — data extraction made smarter, faster, and more resilient.

🚀 What is Chuscraper?

Chuscraper is a Python web & mobile scraping library that uses CDP (Chrome DevTools Protocol) for web and ADB (Android Debug Bridge) for mobile apps. It extracts structured data, interacts with pages/screens, and automates workflows — with a heavy focus on Anti-Detection and Stealth.

It converts standard Chromium instances into undetectable agents that can bypass bot verification systems like Cloudflare, Akamai, and Datadome, while also allowing control of native Android apps for data extraction.

🌟 Key Features

🕷️ Universal Crawler (New!)

Turn entire websites into LLM-ready data with a single command.

Sitemap & BFS: Supports both sitemap-based (fast) and BFS (deep) crawling strategies.
Streaming: Stream extracted data directly to your database without memory limits.
Multi-Format: Extract Markdown, HTML, and Text simultaneously.
Robust: Handles redirects, SPA link discovery, and concurrency automatically.
AI Extraction: Integrate OpenAI/LLMs to extract structured JSON data from any page using natural language prompts.

📱 Native Mobile App Scraping

Chuscraper now supports scraping native Android apps using ADB:

UI Automation: Tap, swipe, and type on any connected Android device (Real or Emulator).
XML Dumping: Extract the full UI hierarchy as XML to find elements by text, resource-id, or content-desc.
Background Execution: Run scripts without touching the device.
Zero-Setup: Just enable USB Debugging and connect. No Appium server required.

🕵️‍♂️ Dynamic Stealth & Fingerprinting (New!)

Chuscraper now includes an advanced Auto-Update and Fingerprint Rotation engine:

Auto-Update Chrome Version: Automatically detects your installed Chrome version and updates the User-Agent to match. No manual updates required!
Fingerprint Rotation: Randomizes hardware fingerprints (RAM, CPU, Screen Resolution) per session while strictly adhering to your host OS (Windows, macOS, Linux) to prevent OS mismatch detection.
Client Hints Sync: Automatically patches navigator.userAgentData to match the User-Agent string.
Advanced Stealth Patches: 6 core JS bypasses for WebDriver, Chrome Runtime, Canvas/WebGL noise, and iFrame leaks.
Modern Timezones: Automatically syncs browser timezone with IP location using modern IANA names.

⚡ Async + Fast

Built on async CDP, low overhead, no heavy browser bundles.

🔄 Advanced Selector & Extraction Engine (New!)

Chuscraper now includes a high-performance parsing engine:

Adaptive Selectors: Save and automatically relocate elements even if the DOM structure changes.
AI-Ready Extraction: One-click conversion of pages or elements to clean Markdown or normalized Text.
CSS & XPath Support: Unified API for high-speed selection.

🛠️ Hidden Gems (Undocumented Functions)

Chuscraper has several advanced functions that are often missed:

select_text(selector): Quickly get the inner text of an element in one line.
save_snapshot(filename): Save a full MHTML snapshot of the current page.
to_markdown() / to_text(): Convert any live Element directly to Markdown or plain text.
wait_for_ready_state(state): Wait specifically for loading, interactive, or complete document states.
mouse_drag(destination): Perform native drag-and-drop operations with human-like movement.
print_to_pdf(filename): Export the current page as a professional PDF.
get_all_urls(): Extract every link, image, and asset URL from the page in one call.
scroll_down(amount=25): Smoothly scroll down by a percentage of the page height.
human_click(selector) / human_type(selector, text): High-level aliases for ultra-realistic human behavior.
submit(selector): One-click form submission for forms or individual buttons.
activate() / bring_to_front(): Bring a background tab to the front for interaction.

🔄 Flexible Outputs

Supports JSON, CSV, Markdown, Excel, Pydantic, and more.

📦 Installation

pip install chuscraper

Tip

Use within a virtual environment to avoid conflicts.

Example: Advanced Mode (Elite Stealth + Human Interaction)

import asyncio
import chuscraper as zd

async def main():
    # 1. Launch with all-in-one start() helper
    async with await zd.start(
        headless=False,
        stealth=True,
        lang="en-US",
        retry_enabled=True
    ) as browser:
        page = browser.main_tab
        await page.goto("https://github.com/login")

        # 2. Use Ultra-Realistic Human Interactions
        # Automatically retries if element is loading/stale
        await page.human_type("#login_field", "jules_bot")
        await page.human_type("#password", "SecurePass123!")

        # 3. One-Click Form Submission
        await page.submit("form")

        # 4. Extract with Adaptive Selectors
        # 'adaptive=True' saves element metadata for resilient relocation
        results = await page.select_all(".repository-item", adaptive=True)
        
        for item in results:
            # 5. Get clean Markdown for LLMs instantly
            print(await item.to_markdown())

if __name__ == "__main__":
    asyncio.run(main())

Note

chuscraper automatically handles Chrome process cleanup and Local Proxy lifecycle.

⚙️ Configuration Switches (Parameters)

Chuscraper gives you full control via zd.start(). Here are the powerful switches you can use:

🛠️ Core Switches

Switch	Description	Default
`headless`	Run without a visible window (`True`/`False`)	`False`
`stealth`	Master Switch for advanced anti-detection (System Fingerprints + JS Bypasses)	`False`
`stealth_domain`	The domain used for cookie storage/loading in stealth mode	`""`
`user_data_dir`	Path to save/load browser profile (keep logins/cookies)	`Temp`
`proxy`	Proxy URL (e.g. `http://user:pass@host:port`)	`None`

🚀 Advanced Switches

Switch	Description	Default
`browser_executable_path`	Custom path to Chrome/Brave binary (auto-detect if omitted)	Auto
`browser`	Browser selection: `"auto"`, `"chrome"`, `"brave"`	`"auto"`
`browser_args`	Extra Chromium args list	`[]`
`sandbox`	Set `False` for Linux/Docker/root environments	`True`
`lang`	Browser locale/language (e.g., `en-US`, `hi-IN`)	`en-US`
`user_agent`	Manually override User-Agent (not recommended with `stealth=True`)	Auto
`disable_webrtc`	Prevent IP leaks via WebRTC	`True`
`disable_webgl`	Disable WebGL (can reduce detection surface in some setups)	`False`
`timezone`	Force timezone (IANA format, e.g. `Asia/Kolkata`)	Auto/None
`stealth_options`	Dict for fine-grained stealth patches	Built-in defaults
`retry_enabled`	Enable retry helpers for unstable workflows	`False`
`retry_timeout`	Retry timeout seconds	`10.0`
`retry_count`	Retry count	`3`
`browser_connection_timeout`	Wait between connection attempts	`0.25`
`browser_connection_max_tries`	Browser connection retries	`10`

🕵️‍♂️ Granular Stealth Options

When stealth=True, you can fine-tune specific patches by passing a stealth_options dict:

await zd.start(stealth=True, stealth_options={
    "patch_webdriver": True,  # Hide WebDriver
    "patch_webgl": True,      # Spoof Graphics Card
    "patch_canvas": True,     # Add Canvas Noise
    "patch_audio": False      # Disable Audio Fingerprinting noise
})

📱 Mobile Scraping Example

Scrape data from any native Android app (e.g., Hotel/Flight apps):

import asyncio
from chuscraper.mobile import MobileDevice

async def main():
    # Connect to first available device
    device = await MobileDevice().connect()

    # Example: Searching for hotels
    city_input = await device.find_element(text="Enter destination")
    if city_input:
        await city_input.type("Goa")

    search_btn = await device.find_element(resource_id="com.hotel.app:id/search_btn")
    if search_btn:
        await search_btn.click()

    # Extract prices
    prices = await device.find_elements(resource_id="com.hotel.app:id/price_text")
    for price in prices:
        print(price.get_text())

if __name__ == "__main__":
    asyncio.run(main())

🛡️ Stealth & Anti-Detection Proof

We don't just claim to be stealthy; we prove it. Below are the results from top anti-bot detection suites, all passed with 100% "Human" status.

👉 View Full Visual Proofs & Screenshots Here

Detection Suite	Result	Status
SannySoft	No WebDriver detected	✅ Pass
BrowserScan	100% Trust Score	✅ Pass
PixelScan	Consistent Fingerprint	✅ Pass
IPHey	Software Clean (Green)	✅ Pass
CreepJS	0% Stealth / 0% Headless	✅ Pass
Fingerprint.com	No Bot Detected	✅ Pass

🌍 Real-World Protection Bypass

We tested chuscraper against live websites protected by major security providers:

Provider	Target	Result
Cloudflare	Turnstile Demo	✅ Solved Automatically
DataDome	Antoine Vastel Research	✅ Accessed
Akamai	Nike Product Page	✅ Bypassed

📖 Documentation

Full technical guides are available in the docs/ folder:

Translations (Chinese, Japanese, etc.) coming soon.

💖 Support & Sponsorship

chuscraper is an open-source project maintained by [Toufiq Qureshi]. If the library has helped you or your business, please consider supporting its development:

GitHub Sponsors: Sponsor me on GitHub
Corporate Sponsorship: If you are a Proxy Provider or Data Company, we offer featured placement in our documentation. Contact us for partnership opportunities.
Custom Scraping Solutions: Need a private, high-performance scraper? We offer professional consulting.

🛠️ Contributing

Want to contribute? Open an issue or send a pull request — all levels welcome! Please follow the CONTRIBUTING.md guidelines.

📜 License

Chuscraper is licensed under the AGPL-3.0 License. This ensures that any software using Chuscraper must also be open-source, protecting the community and your freedom.

Made with ❤️ by [Toufiq Qureshi]

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
.github/workflows		.github/workflows
chuscraper		chuscraper
examples		examples
tests		tests
website		website
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
publish.ps1		publish.ps1
pyproject.toml		pyproject.toml
verify_17.py		verify_17.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ Chuscraper

🚀 What is Chuscraper?

🌟 Key Features

🕷️ Universal Crawler (New!)

📱 Native Mobile App Scraping

🕵️‍♂️ Dynamic Stealth & Fingerprinting (New!)

⚡ Async + Fast

🔄 Advanced Selector & Extraction Engine (New!)

🛠️ Hidden Gems (Undocumented Functions)

🔄 Flexible Outputs

📦 Installation

Example: Advanced Mode (Elite Stealth + Human Interaction)

⚙️ Configuration Switches (Parameters)

🛠️ Core Switches

🚀 Advanced Switches

🕵️‍♂️ Granular Stealth Options

📱 Mobile Scraping Example

🛡️ Stealth & Anti-Detection Proof

🌍 Real-World Protection Bypass

📖 Documentation

💖 Support & Sponsorship

🛠️ Contributing

📜 License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕷️ Chuscraper

🚀 What is Chuscraper?

🌟 Key Features

🕷️ Universal Crawler (New!)

📱 Native Mobile App Scraping

🕵️‍♂️ Dynamic Stealth & Fingerprinting (New!)

⚡ Async + Fast

🔄 Advanced Selector & Extraction Engine (New!)

🛠️ Hidden Gems (Undocumented Functions)

🔄 Flexible Outputs

📦 Installation

Example: Advanced Mode (Elite Stealth + Human Interaction)

⚙️ Configuration Switches (Parameters)

🛠️ Core Switches

🚀 Advanced Switches

🕵️‍♂️ Granular Stealth Options

📱 Mobile Scraping Example

🛡️ Stealth & Anti-Detection Proof

🌍 Real-World Protection Bypass

📖 Documentation

💖 Support & Sponsorship

🛠️ Contributing

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages