Stealth-focused Web & Mobile automation framework powered by CDP and ADB
You Only Scrape Once β data extraction made smarter, faster, and more resilient.
Chuscraper is a Python web & mobile scraping library that uses CDP (Chrome DevTools Protocol) for web and ADB (Android Debug Bridge) for mobile apps. It extracts structured data, interacts with pages/screens, and automates workflows β with a heavy focus on Anti-Detection and Stealth.
It converts standard Chromium instances into undetectable agents that can bypass bot verification systems like Cloudflare, Akamai, and Datadome, while also allowing control of native Android apps for data extraction.
Turn entire websites into LLM-ready data with a single command.
- Sitemap & BFS: Supports both sitemap-based (fast) and BFS (deep) crawling strategies.
- Streaming: Stream extracted data directly to your database without memory limits.
- Multi-Format: Extract Markdown, HTML, and Text simultaneously.
- Robust: Handles redirects, SPA link discovery, and concurrency automatically.
- AI Extraction: Integrate OpenAI/LLMs to extract structured JSON data from any page using natural language prompts.
Chuscraper now supports scraping native Android apps using ADB:
- UI Automation: Tap, swipe, and type on any connected Android device (Real or Emulator).
- XML Dumping: Extract the full UI hierarchy as XML to find elements by text, resource-id, or content-desc.
- Background Execution: Run scripts without touching the device.
- Zero-Setup: Just enable USB Debugging and connect. No Appium server required.
Chuscraper now includes an advanced Auto-Update and Fingerprint Rotation engine:
- Auto-Update Chrome Version: Automatically detects your installed Chrome version and updates the User-Agent to match. No manual updates required!
- Fingerprint Rotation: Randomizes hardware fingerprints (RAM, CPU, Screen Resolution) per session while strictly adhering to your host OS (Windows, macOS, Linux) to prevent OS mismatch detection.
- Client Hints Sync: Automatically patches
navigator.userAgentDatato match the User-Agent string. - Advanced Stealth Patches: 6 core JS bypasses for WebDriver, Chrome Runtime, Canvas/WebGL noise, and iFrame leaks.
- Modern Timezones: Automatically syncs browser timezone with IP location using modern IANA names.
Built on async CDP, low overhead, no heavy browser bundles.
Chuscraper now includes a high-performance parsing engine:
- Adaptive Selectors: Save and automatically relocate elements even if the DOM structure changes.
- AI-Ready Extraction: One-click conversion of pages or elements to clean Markdown or normalized Text.
- CSS & XPath Support: Unified API for high-speed selection.
π οΈ Hidden Gems (Undocumented Functions)
Chuscraper has several advanced functions that are often missed:
select_text(selector): Quickly get the inner text of an element in one line.save_snapshot(filename): Save a full MHTML snapshot of the current page.to_markdown()/to_text(): Convert any liveElementdirectly to Markdown or plain text.wait_for_ready_state(state): Wait specifically forloading,interactive, orcompletedocument states.mouse_drag(destination): Perform native drag-and-drop operations with human-like movement.print_to_pdf(filename): Export the current page as a professional PDF.get_all_urls(): Extract every link, image, and asset URL from the page in one call.scroll_down(amount=25): Smoothly scroll down by a percentage of the page height.human_click(selector)/human_type(selector, text): High-level aliases for ultra-realistic human behavior.submit(selector): One-click form submission for forms or individual buttons.activate()/bring_to_front(): Bring a background tab to the front for interaction.
Supports JSON, CSV, Markdown, Excel, Pydantic, and more.
pip install chuscraperTip
Use within a virtual environment to avoid conflicts.
import asyncio
import chuscraper as zd
async def main():
# 1. Launch with all-in-one start() helper
async with await zd.start(
headless=False,
stealth=True,
lang="en-US",
retry_enabled=True
) as browser:
page = browser.main_tab
await page.goto("https://github.com/login")
# 2. Use Ultra-Realistic Human Interactions
# Automatically retries if element is loading/stale
await page.human_type("#login_field", "jules_bot")
await page.human_type("#password", "SecurePass123!")
# 3. One-Click Form Submission
await page.submit("form")
# 4. Extract with Adaptive Selectors
# 'adaptive=True' saves element metadata for resilient relocation
results = await page.select_all(".repository-item", adaptive=True)
for item in results:
# 5. Get clean Markdown for LLMs instantly
print(await item.to_markdown())
if __name__ == "__main__":
asyncio.run(main())Note
chuscraper automatically handles Chrome process cleanup and Local Proxy lifecycle.
Chuscraper gives you full control via zd.start(). Here are the powerful switches you can use:
| Switch | Description | Default |
|---|---|---|
headless |
Run without a visible window (True/False) |
False |
stealth |
Master Switch for advanced anti-detection (System Fingerprints + JS Bypasses) | False |
stealth_domain |
The domain used for cookie storage/loading in stealth mode | "" |
user_data_dir |
Path to save/load browser profile (keep logins/cookies) | Temp |
proxy |
Proxy URL (e.g. http://user:pass@host:port) |
None |
| Switch | Description | Default |
|---|---|---|
browser_executable_path |
Custom path to Chrome/Brave binary (auto-detect if omitted) | Auto |
browser |
Browser selection: "auto", "chrome", "brave" |
"auto" |
browser_args |
Extra Chromium args list | [] |
sandbox |
Set False for Linux/Docker/root environments |
True |
lang |
Browser locale/language (e.g., en-US, hi-IN) |
en-US |
user_agent |
Manually override User-Agent (not recommended with stealth=True) |
Auto |
disable_webrtc |
Prevent IP leaks via WebRTC | True |
disable_webgl |
Disable WebGL (can reduce detection surface in some setups) | False |
timezone |
Force timezone (IANA format, e.g. Asia/Kolkata) |
Auto/None |
stealth_options |
Dict for fine-grained stealth patches | Built-in defaults |
retry_enabled |
Enable retry helpers for unstable workflows | False |
retry_timeout |
Retry timeout seconds | 10.0 |
retry_count |
Retry count | 3 |
browser_connection_timeout |
Wait between connection attempts | 0.25 |
browser_connection_max_tries |
Browser connection retries | 10 |
When stealth=True, you can fine-tune specific patches by passing a stealth_options dict:
await zd.start(stealth=True, stealth_options={
"patch_webdriver": True, # Hide WebDriver
"patch_webgl": True, # Spoof Graphics Card
"patch_canvas": True, # Add Canvas Noise
"patch_audio": False # Disable Audio Fingerprinting noise
})Scrape data from any native Android app (e.g., Hotel/Flight apps):
import asyncio
from chuscraper.mobile import MobileDevice
async def main():
# Connect to first available device
device = await MobileDevice().connect()
# Example: Searching for hotels
city_input = await device.find_element(text="Enter destination")
if city_input:
await city_input.type("Goa")
search_btn = await device.find_element(resource_id="com.hotel.app:id/search_btn")
if search_btn:
await search_btn.click()
# Extract prices
prices = await device.find_elements(resource_id="com.hotel.app:id/price_text")
for price in prices:
print(price.get_text())
if __name__ == "__main__":
asyncio.run(main())We don't just claim to be stealthy; we prove it. Below are the results from top anti-bot detection suites, all passed with 100% "Human" status.
π View Full Visual Proofs & Screenshots Here
| Detection Suite | Result | Status |
|---|---|---|
| SannySoft | No WebDriver detected | β Pass |
| BrowserScan | 100% Trust Score | β Pass |
| PixelScan | Consistent Fingerprint | β Pass |
| IPHey | Software Clean (Green) | β Pass |
| CreepJS | 0% Stealth / 0% Headless | β Pass |
| Fingerprint.com | No Bot Detected | β Pass |
We tested chuscraper against live websites protected by major security providers:
| Provider | Target | Result |
|---|---|---|
| Cloudflare | Turnstile Demo | β Solved Automatically |
| DataDome | Antoine Vastel Research | β Accessed |
| Akamai | Nike Product Page | β Bypassed |
Full technical guides are available in the docs/ folder:
Translations (Chinese, Japanese, etc.) coming soon.
chuscraper is an open-source project maintained by [Toufiq Qureshi]. If the library has helped you or your business, please consider supporting its development:
- GitHub Sponsors: Sponsor me on GitHub
- Corporate Sponsorship: If you are a Proxy Provider or Data Company, we offer featured placement in our documentation. Contact us for partnership opportunities.
- Custom Scraping Solutions: Need a private, high-performance scraper? We offer professional consulting.
Want to contribute? Open an issue or send a pull request β all levels welcome! Please follow the CONTRIBUTING.md guidelines.
Chuscraper is licensed under the AGPL-3.0 License. This ensures that any software using Chuscraper must also be open-source, protecting the community and your freedom.
Made with β€οΈ by [Toufiq Qureshi]
