GitHub - aFunnyStrange/scrapy_cffi: An asyncio-style web scraping framework inspired by Scrapy, powered by curl_cffi, supported TLS, Http/Websocekt.

scrapy_cffi

An asyncio-style web scraping framework inspired by Scrapy, powered by curl_cffi.

scrapy_cffi is a lightweight Python crawler framework that mimics the Scrapy architecture while replacing Twisted with curl_cffi as the underlying HTTP/WebSocket client.

It is designed to be efficient, modular, and suitable for both simple tasks and large-scale distributed crawlers.

✨ Features

Scrapy-style architecture: spiders, items, interceptors, pipelines, signals
Fully asyncio-based engine for maximum concurrency
HTTP & WebSocket support: built-in asynchronous clients
Flexible DB integration: Redis, MySQL, MongoDB with async retry & reconnect
Message queue support: RabbitMQ & Kafka
Configurable deployment: settings system supporting .env files, single-instance, cluster mode, and sentinel mode
Lightweight middleware & interceptor system for easy extensions
High-performance C-extension hooks for CPU-intensive tasks
Redis-compatible scheduler (optional) for distributed crawling
Designed for high-concurrency, high-availability crawling

📦 Installation

From PyPI

pip install scrapy_cffi

From source (unstable)

git clone https://github.com/aFunnyStrange/scrapy_cffi.git

cd scrapy_cffi

pip install -e .

🚀 Quick Start

scrapy-cffi startproject <project_name>

cd <project_name>

scrapy-cffi genspider <spider_name> <domain>

python runner.py

Notes:

The CLI command is scrapy_cffi in versions ≤0.1.4 and scrapy-cffi in versions >0.1.4 for improved usability.

Starting from scrapy-cffi >= 0.2.5, RedisScheduler and RabbitMqScheduler no longer automatically terminate when the queue is empty. For finite/terminable spiders, use SCHEDULER_LOOP_END to specify the number of scheduler loops before automatic exit. For continuous-listening spiders (RedisSpider, RabbitMqSpider, or custom persistent spiders), leave SCHEDULER_LOOP_END as None. This change only affects automatic termination; task scheduling remains fully functional.

⚙️ Settings & Deployment

scrapy_cffi now fully supports a flexible settings system:

Load configuration from Python files or .env files
Choose between single-instance, cluster, or sentinel mode
Configure databases, message queues, and concurrency limits in one place
Seamless integration with async Redis/MySQL/MongoDB managers

Example settings.py snippet:

settings.REDIS_INFO.MODE = "sentinel"

settings.REDIS_INFO.SENTINELS = [("<sentinel_host1>", "int(sentinel_port1)"), ("<sentinel_host2>", "int(sentinel_port2)"), ("<sentinel_host3>", "int(sentinel_port3)")]

settings.REDIS_INFO.MASTER_NAME = "<master_name>"

settings.REDIS_INFO.SENTINEL_OVERRIDE_MASTER = ("master_host", "int(master_port)")

📖 Documentation

Full technical documentation and module-level guides are available in the docs/ directory.

📄 License

BSD 3-Clause License. See LICENSE for details.

🛠 Community Highlights

Inspired by the challenges of async Python crawling:

Blocking requests and slow DB integration
Complex deployment for distributed crawlers
Need for fully concurrent HTTP & WebSocket requests

scrapy_cffi addresses these with a modular, high-performance framework that is async-first, extensible, and deployment-ready.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github		.github
docs		docs
scrapy_cffi		scrapy_cffi
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapy_cffi

✨ Features

📦 Installation

From PyPI

From source (unstable)

🚀 Quick Start

⚙️ Settings & Deployment

📖 Documentation

📄 License

🛠 Community Highlights

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scrapy_cffi

✨ Features

📦 Installation

From PyPI

From source (unstable)

🚀 Quick Start

⚙️ Settings & Deployment

📖 Documentation

📄 License

🛠 Community Highlights

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages