Skip to content

Commit 981900b

Browse files
committed
docs: finalize README with user-provided template
1 parent 6e1e569 commit 981900b

File tree

1 file changed

+78
-99
lines changed

1 file changed

+78
-99
lines changed

README.md

Lines changed: 78 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,150 +1,129 @@
1-
🚀 **Looking for an even faster and simpler way to scrape at scale (only 5 lines of code)?** Check out our enhanced version at [**Chuscraper.com**](https://github.com/ToufiqQureshi/chuscraper)! 🚀
1+
<p align="center">
2+
<img src="https://i.ibb.co/HLyG7BBK/Chat-GPT-Image-Feb-16-2026-11-13-14-AM.png" alt="Chuscraper Logo" width="180" />
3+
</p>
24

3-
---
5+
<h1 align="center">🕷️ Chuscraper</h1>
6+
<p align="center">
7+
<strong>LLM + CDP powered undetectable web scraping & automation framework</strong><br/>
8+
You Only Scrape Once — data extraction made smarter, faster, and stealthier.
9+
</p>
410

5-
# 🕷️ Chuscraper: You Only Scrape Once
11+
<p align="center">
12+
<a href="https://pypi.org/project/chuscraper/"><img src="https://static.pepy.tech/personalized-badge/chuscraper?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads"/></a>
13+
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge"/></a>
14+
<a href="https://github.com/ToufiqQureshi/chuscraper"><img src="https://img.shields.io/badge/GitHub-Trending-blue?style=for-the-badge&logo=github"/></a>
15+
</p>
616

7-
[English](README.md) | [中文](docs/chinese.md) | [日本語](docs/japanese.md)
8-
| [한국어](docs/korean.md)
9-
| [Русский](docs/russian.md) | [Türkçe](docs/turkish.md)
10-
| [Deutsch](docs/german.md)
11-
| [Español](docs/spanish.md)
12-
| [français](docs/french.md)
13-
| [Português](docs/portuguese.md)
17+
---
1418

15-
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/chuscraper?period=total&units=INTERNATIONAL_SYSTEM&left_color=BLACK&right_color=GREEN&left_text=downloads)](https://pepy.tech/projects/chuscraper)
16-
[![linting: pylint](https://img.shields.io/badge/linting-pylint-yellowgreen?style=for-the-badge)](https://github.com/ToufiqQureshi/chuscraper)
17-
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
19+
## 🚀 What is Chuscraper?
20+
Chuscraper is a Python web scraping & automation library that uses **CDP (Chrome DevTools Protocol)** and **LLMs** to extract structured data, interact with pages, and automate workflows — all while staying *stealthy and undetected*.
1821

19-
[![Chuscraper Banner](docs/assets/logo_pro.svg)](https://github.com/ToufiqQureshi/chuscraper)
22+
With AI-powered extraction, you tell it *what* to extract — it figures out *how*.
2023

21-
<p align="center">
22-
<a href="https://github.com/ToufiqQureshi/chuscraper" target="_blank"><img src="https://img.shields.io/badge/GitHub-Trending-blue?style=for-the-badge&logo=github" alt="Chuscraper | Trending" style="width: 250px; height: 55px;" width="250" height="55"/></a>
23-
</p>
24+
---
2425

25-
[Chuscraper](https://github.com/ToufiqQureshi/chuscraper) is a *web scraping* python library that uses LLM and direct CDP logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.).
26+
## 🌟 Features
2627

27-
Just say which information you want to extract and the library will do it for you!
28+
### 🕵️‍♂️ Stealth & Anti-Detection
29+
- Hides `navigator.webdriver`, user agent rotation
30+
- Canvas/WebGL noise + hardware spoofing
31+
- Timezone & geolocation spoofing
2832

29-
<p align="center">
30-
<img src="docs/assets/official_logo.png" alt="Chuscraper Logo" width="400">
31-
</p>
33+
### 🤖 AI-Driven Data Extraction
34+
- **Semantic extraction** using LLMs
35+
- Converts HTML into structured JSON/Pydantic
36+
37+
### 🧠 Autonomous Navigation
38+
- Intelligent pilot (`ai_pilot`) that clicks/types until goal achieved
3239

40+
### ⚡ Async + Fast
41+
Built on async CDP, low overhead, no heavy browser bundles.
3342

34-
## 🚀 Integrations
35-
Chuscraper offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python, using LLM frameworks, or working with AI agents, we've got you covered with our comprehensive integration options.
43+
### 🔄 Flexible Outputs
44+
Supports JSON, CSV, Markdown, Excel, Pydantic, and more.
3645

37-
**Integrations**:
38-
- **Providers**: OpenAI, Gemini (Native), Anthropic, Ollama
39-
- **LLM Frameworks**: Langchain, Llama Index, Crew.ai, Agno
40-
- **Output Protocols**: Pydantic, JSON, CSV, Markdown, Excel
41-
- **Stealth**: Built-in Canvas/WebGL noise, Hardware spoofing, UA rotation.
46+
### 🌐 Integrations
47+
- LLM Providers: OpenAI, Gemini, Anthropic, Ollama
48+
- Frameworks: LangChain, LlamaIndex, Agno, Crew.ai
4249

43-
## 🚀 Quick install
50+
---
4451

45-
The reference page for Chuscraper is available on the official page of PyPI: [pypi](https://pypi.org/project/chuscraper/).
52+
## 📦 Installation
4653

4754
```bash
4855
pip install chuscraper
4956

50-
# FOR AI CAPABILITIES
57+
# For AI Capabilities
5158
pip install chuscraper[ai]
5259
```
5360

54-
**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
61+
> [!TIP]
62+
> Use within a virtual environment to avoid conflicts.
5563
64+
---
5665

57-
## 💻 Usage
58-
There are multiple standard scraping methods that can be used to extract information from a website (or local file).
59-
60-
The most common one is the `ai_pilot`, which autonomously navigates and extracts information from a page given a user goal.
61-
66+
## 💻 Quick Start (Async)
6267

6368
```python
6469
import asyncio
6570
from chuscraper import start
6671

6772
async def main():
68-
# Start the stealth browser
6973
browser = await start(headless=False)
7074
page = await browser.get("https://www.makemytrip.com/")
7175

72-
# Define the goal
73-
print("AI is starting to search...")
74-
await page.ai_pilot("Search for hotels in Goa for next weekend")
75-
76+
# Tell the AI what to extract
77+
print("AI is navigating...")
78+
await page.ai_pilot("Search hotels in Goa for next weekend")
79+
7680
# Extract structured data
77-
result = await page.ai_extract("Extract first 3 hotels with prices")
78-
81+
result = await page.ai_extract("Get the first 3 hotels with prices")
7982
import json
80-
print(json.dumps(result, indent=4))
83+
print(json.dumps(result, indent=2))
8184

8285
await browser.stop()
8386

8487
if __name__ == "__main__":
8588
asyncio.run(main())
8689
```
8790

88-
> [!NOTE]
89-
> For OpenAI and other models you just need to pass the provider!
90-
> ```python
91-
> from chuscraper.ai.providers import OpenAIProvider
92-
> provider = OpenAIProvider(api_key="YOUR_OPENAI_API_KEY")
93-
> await page.ai_extract("Extract data", provider=provider)
94-
> ```
95-
91+
---
9692

97-
The output will be a structured dictionary like the following:
93+
## 🤖 AI Usage with Providers
94+
Example using **OpenAIProvider**:
9895

9996
```python
100-
{
101-
"hotels": [
102-
{
103-
"name": "Taj Exotica Resort & Spa",
104-
"price": "₹ 25,000",
105-
"rating": "4.8"
106-
},
107-
{
108-
"name": "Cygnett Inn",
109-
"price": "₹ 4,500",
110-
"rating": "4.2"
111-
}
112-
]
113-
}
97+
from chuscraper.ai.providers import OpenAIProvider
98+
99+
provider = OpenAIProvider(api_key="YOUR_OPENAI_API_KEY")
100+
await page.ai_extract("Extract prices and listings", provider=provider)
114101
```
115102

103+
---
104+
116105
## 📖 Documentation
117-
The documentation for Chuscraper can be found in the [docs/](docs/) folder.
118-
119-
## 🤝 Contributing
120-
121-
Feel free to contribute and join our community to discuss improvements and give us suggestions!
122-
123-
Please see the [contributing guidelines](CONTRIBUTING.md).
124-
125-
## 🔥 AI Methods
126-
127-
| Method Name | Description |
128-
|-------------------------|------------------------------------------------------------------------------------------------------------------|
129-
| ai_pilot | Single-goal autonomous navigator that handles interaction (clicks, types) to reach a target. |
130-
| ai_extract | Semantic data extractor that converts HTML content into structured JSON/Pydantic models. |
131-
| ai_visual_extract | Multi-modal Vision scraper that extracts data directly from the rendered page screenshot. |
132-
| ai_learn_selector | Self-healing tool that generates robust CSS/Xpath selectors for long-term automation. |
133-
| ai_ask | Context-aware Q&A that answers questions based on the current page's content. |
134-
135-
## 🎓 Citations
136-
If you have used our library for research purposes please quote us with the following reference:
137-
```text
138-
@misc{chuscraper,
139-
author = {Toufiq Qureshi},
140-
title = {Chuscraper},
141-
year = {2026},
142-
url = {https://github.com/ToufiqQureshi/chuscraper},
143-
note = {An undetectable & agentic python library for scraping leveraging CDP and LLMs}
144-
}
145-
```
106+
Full docs available in the `docs/` folder:
107+
108+
- [English](README.md)
109+
- [Chinese](docs/chinese.md)
110+
- [Japanese](docs/japanese.md)
111+
- [Korean](docs/korean.md)
112+
- [Russian](docs/russian.md)
113+
- [Turkish](docs/turkish.md)
114+
- [German](docs/german.md)
115+
- [Spanish](docs/spanish.md)
116+
- [French](docs/french.md)
117+
- [Portuguese](docs/portuguese.md)
118+
119+
---
120+
121+
## 🛠️ Contributing
122+
Want to contribute? Open an issue or send a pull request — all levels welcome! Please follow the `CONTRIBUTING.md` guidelines.
123+
124+
---
146125

147126
## 📜 License
148-
Chuscraper is licensed under the MIT License. See the [LICENSE](LICENSE) file for more information.
127+
Chuscraper is licensed under the MIT License.
149128

150129
Made with ❤️ by [Toufiq Qureshi]

0 commit comments

Comments
 (0)