| sidebar_position | 5 |
|---|
This guide focuses on stable, long-running scraping/automation workloads.
- Pin version in your
requirements.txt - Use
async withlifecycle for guaranteed cleanup - Enable
production_ready=Truefor safer connection/retry behavior - Enable
stealth=Truefor protected sites - Add
humanize=Truewhen facing behavioral bot checks - Use robust proxy rotation and backoff strategy
import asyncio
import chuscraper as cs
async def run_job(url: str):
async with await cs.start(
headless=True,
stealth=True,
humanize=True,
production_ready=True,
retry_enabled=True,
retry_count=5,
retry_timeout=20.0,
browser_connection_timeout=0.5,
browser_connection_max_tries=20,
disable_webrtc=True,
) as browser:
tab = await browser.get(url)
await tab.wait(2)
return await tab.title()
async def main():
title = await run_job("https://example.com")
print(title)
if __name__ == "__main__":
asyncio.run(main())Use this advanced template directly and customize proxy/challenge logic:
examples/production_advanced_site_template.py
- Wrap navigation and extraction in retry blocks.
- Log URL, proxy, status code, and exception details.
- On repeated block pages, rotate identity (
proxy, optionally profile). - Keep per-target throttling to avoid burst fingerprints.
headless=Trueis generally faster.headless=Falsemay perform better for very strict anti-bot pages.- Disable unnecessary resources with request interception if your workflow allows.
- Persist structured logs for each run.
- Track block/challenge rate and success rate by domain.
- Keep a canary target (known easy page) to separate infra failures from target-site blocking.