A comprehensive Claude skill for web crawling and data extraction using Crawl4AI. This skill enables Claude to scrape websites, extract structured data, handle JavaScript-heavy pages, crawl multiple URLs, and build automated web data pipelines.
- Web Crawling: Extract content from any website with full JavaScript support
- Data Extraction: Schema-based CSS extraction (LLM-free) and LLM-based extraction
- Markdown Generation: Clean, well-formatted markdown output optimized for LLM consumption
- Content Filtering: Relevance-based filtering using BM25 and quality-based pruning
- Session Management: Persistent sessions for authenticated crawling
- Batch Processing: Concurrent multi-URL crawling
- CLI & SDK: Both command-line interface and Python SDK support
-
Download or clone this repository
-
Create a ZIP file of the
crawl4aidirectory:cd crawl4ai-skill zip -r crawl4ai.zip crawl4ai/ -
In Claude Desktop, go to Settings → Developer → Import Skill
-
Select the
crawl4ai.zipfile
git clone https://github.com/brettdavies/crawl4ai-skill.git
cd crawl4ai-skillThen add the skill directory to Claude's skills folder or import via Claude Desktop.
This skill requires the Crawl4AI Python library:
pip install crawl4ai
crawl4ai-setup
# Verify installation
crawl4ai-doctor# Basic crawling - returns markdown
crwl https://example.com
# Get markdown output
crwl https://example.com -o markdown
# JSON output with cache bypass
crwl https://example.com -o json -v --bypass-cacheimport asyncio
from crawl4ai import AsyncWebCrawler
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun("https://example.com")
print(result.markdown[:500])
asyncio.run(main())- SKILL.md - Complete skill documentation with examples
- CLI Guide - Command-line interface reference
- SDK Guide - Python SDK quick reference
- Complete SDK Reference - Full API documentation (5900+ lines)
crwl https://docs.example.com -o markdown > docs.md# Generate schema once (uses LLM)
python crawl4ai/scripts/extraction_pipeline.py --generate-schema https://shop.com "extract products"
# Use schema for extraction (no LLM costs)
crwl https://shop.com -e extract_css.yml -s product_schema.json -o json# Multiple sources with filtering
for url in news1.com news2.com news3.com; do
crwl "https://$url" -f filter_bm25.yml -o markdown-fit
doneThe skill includes helper scripts in crawl4ai/scripts/:
- basic_crawler.py - Simple markdown extraction
- batch_crawler.py - Multi-URL processing
- extraction_pipeline.py - Schema generation and extraction
Run the test suite to verify the skill works correctly:
cd crawl4ai/tests
python run_all_tests.pyThis skill is available on Claude Skills marketplaces:
MIT License - see LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
For issues, questions, or feature requests, please open an issue on the GitHub repository.
See CHANGELOG.md for version history and updates.