Crawl Agent 4AI 🕷️

Advanced asynchronous web scraping with LLM-powered content extraction and Streamlit UI

Introduction

Crawl Agent 4AI is designed to efficiently scrape websites while handling dynamic content and respecting robots.txt. With both LLM-based and structured extraction modes, it caters to different scraping needs. The project uses asynchronous functions for improved performance and leverages Streamlit to provide an interactive UI for initiating and monitoring scrapes.

✨ Key Features

🚀 Async Crawling - Fast, non-blocking operations with asyncio
🤖 LLM Extraction - AI-powered content understanding and extraction
🎯 Structured Extraction - Precise CSS-based data selection
🎨 Modern UI - Clean Streamlit interface for easy operation
🛡️ Smart Protection - Automatic robots.txt validation
⚡ Dynamic Content - Handles JavaScript-rendered pages

🚀 Quick Start

# Install
git clone https://github.com/arben-adm/crawl-agent-4ai.git
cd crawl-agent-4ai
pip install -r requirements.txt
run crawl4ai-setup

to confirm everthing is working: crawl4ai-doctor 

# Run
streamlit run app/main.py

💡 Usage

Start the app with streamlit run app/main.py
Enter target URL and select mode:
- LLM Mode: AI-powered content extraction
- Structured Mode: CSS-based precise extraction
Configure advanced settings if needed:
- Dynamic content wait time
- Hidden elements extraction
- Custom extraction rules
Click "Start Scraping" and view results in tabs

⚙️ Configuration

Setting	Description	Default
Dynamic Wait	Time to wait for JS content	5s
Process Dynamic	Handle JS-rendered content	true
Extract Hidden	Include hidden elements	true
LLM Instructions	Custom extraction rules	"Extract all text"

Troubleshooting

Scraping Not Allowed: If the website disallows crawling via robots.txt, the app will show an error. Check the URL or try another site.
Errors During Scraping: Ensure you have a stable internet connection and that all dependencies are installed.

🤝 Contributing

Fork the repo
Create feature branch (git checkout -b feature/amazing)
Commit changes (git commit -am 'Add something amazing')
Push branch (git push origin feature/amazing)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawl Agent 4AI 🕷️

Introduction

✨ Key Features

🚀 Quick Start

💡 Usage

⚙️ Configuration

Troubleshooting

🤝 Contributing

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crawl Agent 4AI 🕷️

Introduction

✨ Key Features

🚀 Quick Start

💡 Usage

⚙️ Configuration

Troubleshooting

🤝 Contributing

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages