Advanced asynchronous web scraping with LLM-powered content extraction and Streamlit UI
Crawl Agent 4AI is designed to efficiently scrape websites while handling dynamic content and respecting robots.txt. With both LLM-based and structured extraction modes, it caters to different scraping needs. The project uses asynchronous functions for improved performance and leverages Streamlit to provide an interactive UI for initiating and monitoring scrapes.
- 🚀 Async Crawling - Fast, non-blocking operations with asyncio
- 🤖 LLM Extraction - AI-powered content understanding and extraction
- 🎯 Structured Extraction - Precise CSS-based data selection
- 🎨 Modern UI - Clean Streamlit interface for easy operation
- 🛡️ Smart Protection - Automatic robots.txt validation
- ⚡ Dynamic Content - Handles JavaScript-rendered pages
# Install
git clone https://github.com/arben-adm/crawl-agent-4ai.git
cd crawl-agent-4ai
pip install -r requirements.txt
run crawl4ai-setup
to confirm everthing is working: crawl4ai-doctor
# Run
streamlit run app/main.py- Start the app with
streamlit run app/main.py - Enter target URL and select mode:
- LLM Mode: AI-powered content extraction
- Structured Mode: CSS-based precise extraction
- Configure advanced settings if needed:
- Dynamic content wait time
- Hidden elements extraction
- Custom extraction rules
- Click "Start Scraping" and view results in tabs
| Setting | Description | Default |
|---|---|---|
| Dynamic Wait | Time to wait for JS content | 5s |
| Process Dynamic | Handle JS-rendered content | true |
| Extract Hidden | Include hidden elements | true |
| LLM Instructions | Custom extraction rules | "Extract all text" |
- Scraping Not Allowed: If the website disallows crawling via robots.txt, the app will show an error. Check the URL or try another site.
- Errors During Scraping: Ensure you have a stable internet connection and that all dependencies are installed.
- Fork the repo
- Create feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -am 'Add something amazing') - Push branch (
git push origin feature/amazing) - Open a Pull Request
MIT © Arben Ademi