Skip to content

Commit d439622

Browse files
committed
docs: update README to highlight new features and breaking changes
1 parent 75c6e50 commit d439622

File tree

1 file changed

+274
-47
lines changed

1 file changed

+274
-47
lines changed

README.md

Lines changed: 274 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -13,100 +13,327 @@
1313
<img src="https://github.com/thalissonvs/pydoll/actions/workflows/tests.yml/badge.svg" alt="Tests">
1414
<img src="https://github.com/thalissonvs/pydoll/actions/workflows/ruff-ci.yml/badge.svg" alt="Ruff CI">
1515
<img src="https://github.com/thalissonvs/pydoll/actions/workflows/release.yml/badge.svg" alt="Release">
16-
<img src="https://tokei.rs/b1/github/thalissonvs/pydoll" alt="Total lines">
17-
<img src="https://tokei.rs/b1/github/thalissonvs/pydoll?category=files" alt="Files">
18-
<img src="https://tokei.rs/b1/github/thalissonvs/pydoll?category=comments" alt="Comments">
1916
</p>
2017

2118
<p align="center">
2219
<b>Pydoll</b> is revolutionizing browser automation! Unlike other solutions, it <b>eliminates the need for webdrivers</b>,
23-
providing a smooth and reliable automation experience with native asynchronous performance.
20+
providing a smooth and reliable automation experience with native asynchronous performance and advanced capabilities
21+
like intelligent captcha bypass and comprehensive network monitoring.
2422
</p>
2523

2624
<p align="center">
2725
<a href="https://autoscrape-labs.github.io/pydoll/">Documentation</a> •
2826
<a href="#-quick-start">Quick Start</a> •
27+
<a href="#-breaking-changes">Breaking Changes</a> •
28+
<a href="#-advanced-features">Advanced Features</a> •
2929
<a href="#-contributing">Contributing</a> •
3030
<a href="#-support-my-work">Support</a> •
3131
<a href="#-license">License</a>
3232
</p>
3333

34-
## Key Features
34+
## What Makes Pydoll Special
3535

36-
🔹 **Zero Webdrivers!** Say goodbye to webdriver compatibility nightmares
37-
🔹 **Native Captcha Bypass!** Smoothly handles Cloudflare Turnstile and reCAPTCHA v3*
38-
🔹 **Async Performance** for lightning-fast automation
39-
🔹 **Human-like Interactions** that mimic real user behavior
40-
🔹 **Powerful Event System** for reactive automations
41-
🔹 **Multi-browser Support** including Chrome and Edge
36+
Pydoll isn't just another browser automation library. It's a complete solution built from the ground up for modern web automation challenges:
37+
38+
🔹 **Zero Webdrivers!** Direct Chrome DevTools Protocol integration - no more compatibility nightmares
39+
🔹 **Intelligent Captcha Bypass** - Automatically handles Cloudflare Turnstile and reCAPTCHA v3*
40+
🔹 **True Async Performance** - Built for speed with native asyncio support
41+
🔹 **Human-like Interactions** - Advanced timing and behavior patterns that mimic real users
42+
🔹 **Powerful Network Monitoring** - Intercept, modify, and analyze all network traffic
43+
🔹 **Event-Driven Architecture** - React to page events, network requests, and user interactions
44+
🔹 **Multi-browser Support** - Chrome and Edge with consistent APIs
45+
🔹 **Intuitive Element Finding** - Modern `find()` and `query()` methods for effortless element location
46+
🔹 **Robust Type Safety** - Comprehensive type system for better IDE support and error prevention
4247

4348
## Installation
4449

4550
```bash
4651
pip install pydoll-python
4752
```
4853

49-
## Quick Start
54+
## Breaking Changes (v2.0+)
55+
56+
If you're upgrading from an earlier version, please note these important changes:
57+
58+
### Import Changes
59+
```python
60+
# Old way (deprecated)
61+
from pydoll.browser.options import Options
62+
from pydoll.browser import Chrome, Edge
63+
64+
# New way
65+
from pydoll.browser.options import ChromiumOptions
66+
from pydoll.browser.chromium import Chrome, Edge
67+
```
68+
69+
### Element Finding Methods
70+
```python
71+
# Old way
72+
element = await page.find_element(By.CSS_SELECTOR, 'button')
73+
74+
# New intuitive methods
75+
element = await tab.find(tag_name='button') # Find by attributes
76+
element = await tab.query('button') # CSS selector or XPath
77+
```
5078

51-
Get started with just a few lines of code:
79+
### Tab-Based Architecture
80+
```python
81+
# Old way
82+
async with Chrome() as browser:
83+
await browser.start()
84+
page = await browser.get_page()
85+
86+
# New way - start() returns Tab directly
87+
async with Chrome() as browser:
88+
tab = await browser.start() # Returns Tab instance directly
89+
# or create additional tabs
90+
new_tab = await browser.new_tab()
91+
```
92+
93+
## Quick Start
5294

95+
### Basic Automation
5396
```python
5497
import asyncio
55-
from pydoll.browser.chrome import Chrome
56-
from pydoll.constants import By
98+
from pydoll.browser import Chrome
99+
from pydoll.browser.options import ChromiumOptions
57100

58101
async def main():
102+
# Simple automation
59103
async with Chrome() as browser:
60-
await browser.start()
61-
page = await browser.get_page()
104+
tab = await browser.start() # Returns Tab directly
62105

63-
# Works with captcha-protected sites
64-
await page.go_to('https://example-with-cloudflare.com')
65-
button = await page.find_element(By.CSS_SELECTOR, 'button')
106+
await tab.go_to('https://example.com')
107+
108+
# Modern element finding
109+
button = await tab.find(tag_name='button', class_name='submit')
66110
await button.click()
111+
112+
# Or use CSS selectors/XPath directly
113+
link = await tab.query('a[href*="contact"]')
114+
await link.click()
67115

68116
asyncio.run(main())
69117
```
70118

71-
Need to configure your browser? Easy!
119+
### Custom Browser Configuration
120+
```python
121+
from pydoll.browser import Chrome
122+
from pydoll.browser.options import ChromiumOptions
123+
124+
async def main():
125+
# Configure browser options
126+
options = ChromiumOptions()
127+
options.add_argument('--proxy-server=username:password@ip:port')
128+
options.add_argument('--window-size=1920,1080')
129+
options.add_argument('--disable-web-security')
130+
options.binary_location = '/path/to/your/browser'
131+
132+
async with Chrome(options=options) as browser:
133+
tab = await browser.start()
134+
135+
# Your automation code here
136+
await tab.go_to('https://example.com')
137+
138+
asyncio.run(main())
139+
```
140+
141+
## Advanced Features
142+
143+
### Intelligent Captcha Bypass
144+
145+
Pydoll can automatically handle Cloudflare Turnstile captchas without external services:
72146

73147
```python
74-
from pydoll.browser.chrome import Chrome
75-
from pydoll.browser.options import Options
148+
import asyncio
149+
from pydoll.browser import Chrome
76150

77-
options = Options()
78-
# Add a proxy
79-
options.add_argument('--proxy-server=username:password@ip:port')
80-
# Custom browser location
81-
options.binary_location = '/path/to/your/browser'
151+
async def bypass_cloudflare():
152+
async with Chrome() as browser:
153+
tab = await browser.start()
154+
155+
# Method 1: Context manager (waits for captcha completion)
156+
async with tab.expect_and_bypass_cloudflare_captcha():
157+
await tab.go_to('https://site-with-cloudflare.com')
158+
print("Captcha automatically handled!")
159+
160+
# Method 2: Background processing
161+
await tab.enable_auto_solve_cloudflare_captcha()
162+
await tab.go_to('https://another-protected-site.com')
163+
# Captcha solved in background while code continues
164+
165+
await tab.disable_auto_solve_cloudflare_captcha()
82166

83-
async with Chrome(options=options) as browser:
84-
await browser.start()
85-
# Your code here
167+
asyncio.run(bypass_cloudflare())
86168
```
87169

88-
## Documentation
170+
### Advanced Element Finding
89171

90-
For comprehensive documentation, examples, and deep dives into Pydoll's features, visit our [official documentation site](https://autoscrape-labs.github.io/pydoll/).
172+
Pydoll offers multiple intuitive ways to find elements:
91173

92-
The documentation includes:
93-
- Detailed usage examples
94-
- API reference
95-
- Advanced techniques and patterns
96-
- Troubleshooting guides
174+
```python
175+
import asyncio
176+
from pydoll.browser import Chrome
97177

98-
## Sponsors
178+
async def element_finding_examples():
179+
async with Chrome() as browser:
180+
tab = await browser.start()
181+
await tab.go_to('https://example.com')
182+
183+
# Find by attributes (most intuitive)
184+
submit_btn = await tab.find(
185+
tag_name='button',
186+
class_name='btn-primary',
187+
text='Submit'
188+
)
189+
190+
# Find by ID
191+
username_field = await tab.find(id='username')
192+
193+
# Find multiple elements
194+
all_links = await tab.find(tag_name='a', find_all=True)
195+
196+
# CSS selectors and XPath
197+
nav_menu = await tab.query('nav.main-menu')
198+
specific_item = await tab.query('//div[@data-testid="item-123"]')
199+
200+
# With timeout and error handling
201+
delayed_element = await tab.find(
202+
class_name='dynamic-content',
203+
timeout=10,
204+
raise_exc=False # Returns None if not found
205+
)
206+
207+
# Advanced: Custom attributes
208+
custom_element = await tab.find(
209+
data_testid='submit-button',
210+
aria_label='Submit form'
211+
)
212+
213+
asyncio.run(element_finding_examples())
214+
```
99215

100-
[CapSolver](https://www.capsolver.com/?utm_source=github&utm_medium=banner_repo&utm_campaign=scraping&utm_term=pydoll) is an AI-powered tool that easily bypasses Captchas, allowing uninterrupted access to public data with fast, reliable, and cost-effective. And please enjoy the code PYDOLL to get an extra 6% balance! and register [here](https://dashboard.capsolver.com/passport/?utm_source=github&utm_medium=banner_repo&utm_campaign=scraping&utm_term=pydoll)
216+
### Concurrent Automation
101217

102-
<p align="left">
103-
<a href="https://www.capsolver.com/?utm_source=github&utm_medium=banner_repo&utm_campaign=scraping&utm_term=pydoll" target="_blank">
104-
<img src="https://github.com/user-attachments/assets/aaf49563-2b93-49c3-8f9c-c2dccc8dc0c8" alt="Pydoll Sponsors" width="1200" />
105-
</a>
106-
</p>
218+
Leverage async capabilities for parallel processing:
219+
220+
```python
221+
import asyncio
222+
from pydoll.browser import Chrome
223+
224+
async def scrape_page(url):
225+
"""Scrape a single page"""
226+
async with Chrome() as browser:
227+
tab = await browser.start()
228+
await tab.go_to(url)
229+
230+
title = await tab.execute_script('return document.title')
231+
links = await tab.find(tag_name='a', find_all=True)
232+
233+
return {
234+
'url': url,
235+
'title': title,
236+
'link_count': len(links)
237+
}
238+
239+
async def concurrent_scraping():
240+
urls = [
241+
'https://example1.com',
242+
'https://example2.com',
243+
'https://example3.com'
244+
]
245+
246+
# Process all URLs concurrently
247+
tasks = [scrape_page(url) for url in urls]
248+
results = await asyncio.gather(*tasks)
249+
250+
for result in results:
251+
print(f"{result['url']}: {result['title']} ({result['link_count']} links)")
252+
253+
asyncio.run(concurrent_scraping())
254+
```
255+
256+
### Event-Driven Automation
257+
258+
React to page events and user interactions:
259+
260+
```python
261+
import asyncio
262+
from pydoll.browser import Chrome
263+
from pydoll.protocol.page.events import PageEvent
264+
265+
async def event_driven_automation():
266+
async with Chrome() as browser:
267+
tab = await browser.start()
268+
269+
# Enable page events
270+
await tab.enable_page_events()
271+
272+
# React to page load
273+
async def on_page_load(event):
274+
print("Page loaded! Starting automation...")
275+
# Perform actions after page loads
276+
await tab.find(id='search-box').type('automation')
277+
278+
# React to navigation
279+
async def on_navigation(event):
280+
url = event['params']['url']
281+
print(f"Navigated to: {url}")
282+
283+
await tab.on(PageEvent.LOAD_EVENT_FIRED, on_page_load)
284+
await tab.on(PageEvent.FRAME_NAVIGATED, on_navigation)
285+
286+
await tab.go_to('https://example.com')
287+
await asyncio.sleep(5) # Let events process
288+
289+
asyncio.run(event_driven_automation())
290+
```
291+
292+
### Working with iFrames
293+
294+
Pydoll provides seamless iframe interaction through the `get_frame()` method:
107295

108-
Pydoll is proudly supported by these amazing sponsors who believe in the future of webdriver-free automation. Their contributions make it possible for us to maintain and improve this project.
109-
Interested in becoming a sponsor? Check out our [GitHub Sponsors page](https://github.com/sponsors/thalissonvs) for more information about the perks and benefits of sponsoring this project!
296+
```python
297+
import asyncio
298+
from pydoll.browser.chromium import Chrome
299+
300+
async def iframe_interaction():
301+
async with Chrome() as browser:
302+
tab = await browser.start()
303+
await tab.go_to('https://example.com/page-with-iframe')
304+
305+
# Find the iframe element
306+
iframe_element = await tab.query('.hcaptcha-iframe', timeout=10)
307+
308+
# Get a Tab instance for the iframe content
309+
frame = await tab.get_frame(iframe_element)
310+
311+
# Now interact with elements inside the iframe
312+
submit_button = await frame.find(tag_name='button', class_name='submit')
313+
await submit_button.click()
314+
315+
# You can use all Tab methods on the frame
316+
form_input = await frame.find(id='captcha-input')
317+
await form_input.type('verification-code')
318+
319+
# Find elements by various methods
320+
links = await frame.find(tag_name='a', find_all=True)
321+
specific_element = await frame.query('#specific-id')
322+
323+
asyncio.run(iframe_interaction())
324+
```
325+
326+
## Documentation
327+
328+
For comprehensive documentation, examples, and deep dives into Pydoll's features, visit our [official documentation site](https://autoscrape-labs.github.io/pydoll/).
329+
330+
The documentation includes:
331+
- **Getting Started Guide** - Step-by-step tutorials
332+
- **API Reference** - Complete method documentation
333+
- **Advanced Techniques** - Network interception, event handling, performance optimization
334+
- **Migration Guide** - Upgrading from older versions
335+
- **Troubleshooting** - Common issues and solutions
336+
- **Best Practices** - Patterns for reliable automation
110337

111338
## Contributing
112339

0 commit comments

Comments
 (0)