|
13 | 13 | <img src="https://github.com/thalissonvs/pydoll/actions/workflows/tests.yml/badge.svg" alt="Tests"> |
14 | 14 | <img src="https://github.com/thalissonvs/pydoll/actions/workflows/ruff-ci.yml/badge.svg" alt="Ruff CI"> |
15 | 15 | <img src="https://github.com/thalissonvs/pydoll/actions/workflows/release.yml/badge.svg" alt="Release"> |
16 | | - <img src="https://tokei.rs/b1/github/thalissonvs/pydoll" alt="Total lines"> |
17 | | - <img src="https://tokei.rs/b1/github/thalissonvs/pydoll?category=files" alt="Files"> |
18 | | - <img src="https://tokei.rs/b1/github/thalissonvs/pydoll?category=comments" alt="Comments"> |
19 | 16 | </p> |
20 | 17 |
|
21 | 18 | <p align="center"> |
22 | 19 | <b>Pydoll</b> is revolutionizing browser automation! Unlike other solutions, it <b>eliminates the need for webdrivers</b>, |
23 | | - providing a smooth and reliable automation experience with native asynchronous performance. |
| 20 | + providing a smooth and reliable automation experience with native asynchronous performance and advanced capabilities |
| 21 | + like intelligent captcha bypass and comprehensive network monitoring. |
24 | 22 | </p> |
25 | 23 |
|
26 | 24 | <p align="center"> |
27 | 25 | <a href="https://autoscrape-labs.github.io/pydoll/">Documentation</a> • |
28 | 26 | <a href="#-quick-start">Quick Start</a> • |
| 27 | + <a href="#-breaking-changes">Breaking Changes</a> • |
| 28 | + <a href="#-advanced-features">Advanced Features</a> • |
29 | 29 | <a href="#-contributing">Contributing</a> • |
30 | 30 | <a href="#-support-my-work">Support</a> • |
31 | 31 | <a href="#-license">License</a> |
32 | 32 | </p> |
33 | 33 |
|
34 | | -## Key Features |
| 34 | +## What Makes Pydoll Special |
35 | 35 |
|
36 | | -🔹 **Zero Webdrivers!** Say goodbye to webdriver compatibility nightmares |
37 | | -🔹 **Native Captcha Bypass!** Smoothly handles Cloudflare Turnstile and reCAPTCHA v3* |
38 | | -🔹 **Async Performance** for lightning-fast automation |
39 | | -🔹 **Human-like Interactions** that mimic real user behavior |
40 | | -🔹 **Powerful Event System** for reactive automations |
41 | | -🔹 **Multi-browser Support** including Chrome and Edge |
| 36 | +Pydoll isn't just another browser automation library. It's a complete solution built from the ground up for modern web automation challenges: |
| 37 | + |
| 38 | +🔹 **Zero Webdrivers!** Direct Chrome DevTools Protocol integration - no more compatibility nightmares |
| 39 | +🔹 **Intelligent Captcha Bypass** - Automatically handles Cloudflare Turnstile and reCAPTCHA v3* |
| 40 | +🔹 **True Async Performance** - Built for speed with native asyncio support |
| 41 | +🔹 **Human-like Interactions** - Advanced timing and behavior patterns that mimic real users |
| 42 | +🔹 **Powerful Network Monitoring** - Intercept, modify, and analyze all network traffic |
| 43 | +🔹 **Event-Driven Architecture** - React to page events, network requests, and user interactions |
| 44 | +🔹 **Multi-browser Support** - Chrome and Edge with consistent APIs |
| 45 | +🔹 **Intuitive Element Finding** - Modern `find()` and `query()` methods for effortless element location |
| 46 | +🔹 **Robust Type Safety** - Comprehensive type system for better IDE support and error prevention |
42 | 47 |
|
43 | 48 | ## Installation |
44 | 49 |
|
45 | 50 | ```bash |
46 | 51 | pip install pydoll-python |
47 | 52 | ``` |
48 | 53 |
|
49 | | -## Quick Start |
| 54 | +## Breaking Changes (v2.0+) |
| 55 | + |
| 56 | +If you're upgrading from an earlier version, please note these important changes: |
| 57 | + |
| 58 | +### Import Changes |
| 59 | +```python |
| 60 | +# Old way (deprecated) |
| 61 | +from pydoll.browser.options import Options |
| 62 | +from pydoll.browser import Chrome, Edge |
| 63 | + |
| 64 | +# New way |
| 65 | +from pydoll.browser.options import ChromiumOptions |
| 66 | +from pydoll.browser.chromium import Chrome, Edge |
| 67 | +``` |
| 68 | + |
| 69 | +### Element Finding Methods |
| 70 | +```python |
| 71 | +# Old way |
| 72 | +element = await page.find_element(By.CSS_SELECTOR, 'button') |
| 73 | + |
| 74 | +# New intuitive methods |
| 75 | +element = await tab.find(tag_name='button') # Find by attributes |
| 76 | +element = await tab.query('button') # CSS selector or XPath |
| 77 | +``` |
50 | 78 |
|
51 | | -Get started with just a few lines of code: |
| 79 | +### Tab-Based Architecture |
| 80 | +```python |
| 81 | +# Old way |
| 82 | +async with Chrome() as browser: |
| 83 | + await browser.start() |
| 84 | + page = await browser.get_page() |
| 85 | + |
| 86 | +# New way - start() returns Tab directly |
| 87 | +async with Chrome() as browser: |
| 88 | + tab = await browser.start() # Returns Tab instance directly |
| 89 | + # or create additional tabs |
| 90 | + new_tab = await browser.new_tab() |
| 91 | +``` |
| 92 | + |
| 93 | +## Quick Start |
52 | 94 |
|
| 95 | +### Basic Automation |
53 | 96 | ```python |
54 | 97 | import asyncio |
55 | | -from pydoll.browser.chrome import Chrome |
56 | | -from pydoll.constants import By |
| 98 | +from pydoll.browser import Chrome |
| 99 | +from pydoll.browser.options import ChromiumOptions |
57 | 100 |
|
58 | 101 | async def main(): |
| 102 | + # Simple automation |
59 | 103 | async with Chrome() as browser: |
60 | | - await browser.start() |
61 | | - page = await browser.get_page() |
| 104 | + tab = await browser.start() # Returns Tab directly |
62 | 105 |
|
63 | | - # Works with captcha-protected sites |
64 | | - await page.go_to('https://example-with-cloudflare.com') |
65 | | - button = await page.find_element(By.CSS_SELECTOR, 'button') |
| 106 | + await tab.go_to('https://example.com') |
| 107 | + |
| 108 | + # Modern element finding |
| 109 | + button = await tab.find(tag_name='button', class_name='submit') |
66 | 110 | await button.click() |
| 111 | + |
| 112 | + # Or use CSS selectors/XPath directly |
| 113 | + link = await tab.query('a[href*="contact"]') |
| 114 | + await link.click() |
67 | 115 |
|
68 | 116 | asyncio.run(main()) |
69 | 117 | ``` |
70 | 118 |
|
71 | | -Need to configure your browser? Easy! |
| 119 | +### Custom Browser Configuration |
| 120 | +```python |
| 121 | +from pydoll.browser import Chrome |
| 122 | +from pydoll.browser.options import ChromiumOptions |
| 123 | + |
| 124 | +async def main(): |
| 125 | + # Configure browser options |
| 126 | + options = ChromiumOptions() |
| 127 | + options.add_argument('--proxy-server=username:password@ip:port') |
| 128 | + options.add_argument('--window-size=1920,1080') |
| 129 | + options.add_argument('--disable-web-security') |
| 130 | + options.binary_location = '/path/to/your/browser' |
| 131 | + |
| 132 | + async with Chrome(options=options) as browser: |
| 133 | + tab = await browser.start() |
| 134 | + |
| 135 | + # Your automation code here |
| 136 | + await tab.go_to('https://example.com') |
| 137 | + |
| 138 | +asyncio.run(main()) |
| 139 | +``` |
| 140 | + |
| 141 | +## Advanced Features |
| 142 | + |
| 143 | +### Intelligent Captcha Bypass |
| 144 | + |
| 145 | +Pydoll can automatically handle Cloudflare Turnstile captchas without external services: |
72 | 146 |
|
73 | 147 | ```python |
74 | | -from pydoll.browser.chrome import Chrome |
75 | | -from pydoll.browser.options import Options |
| 148 | +import asyncio |
| 149 | +from pydoll.browser import Chrome |
76 | 150 |
|
77 | | -options = Options() |
78 | | -# Add a proxy |
79 | | -options.add_argument('--proxy-server=username:password@ip:port') |
80 | | -# Custom browser location |
81 | | -options.binary_location = '/path/to/your/browser' |
| 151 | +async def bypass_cloudflare(): |
| 152 | + async with Chrome() as browser: |
| 153 | + tab = await browser.start() |
| 154 | + |
| 155 | + # Method 1: Context manager (waits for captcha completion) |
| 156 | + async with tab.expect_and_bypass_cloudflare_captcha(): |
| 157 | + await tab.go_to('https://site-with-cloudflare.com') |
| 158 | + print("Captcha automatically handled!") |
| 159 | + |
| 160 | + # Method 2: Background processing |
| 161 | + await tab.enable_auto_solve_cloudflare_captcha() |
| 162 | + await tab.go_to('https://another-protected-site.com') |
| 163 | + # Captcha solved in background while code continues |
| 164 | + |
| 165 | + await tab.disable_auto_solve_cloudflare_captcha() |
82 | 166 |
|
83 | | -async with Chrome(options=options) as browser: |
84 | | - await browser.start() |
85 | | - # Your code here |
| 167 | +asyncio.run(bypass_cloudflare()) |
86 | 168 | ``` |
87 | 169 |
|
88 | | -## Documentation |
| 170 | +### Advanced Element Finding |
89 | 171 |
|
90 | | -For comprehensive documentation, examples, and deep dives into Pydoll's features, visit our [official documentation site](https://autoscrape-labs.github.io/pydoll/). |
| 172 | +Pydoll offers multiple intuitive ways to find elements: |
91 | 173 |
|
92 | | -The documentation includes: |
93 | | -- Detailed usage examples |
94 | | -- API reference |
95 | | -- Advanced techniques and patterns |
96 | | -- Troubleshooting guides |
| 174 | +```python |
| 175 | +import asyncio |
| 176 | +from pydoll.browser import Chrome |
97 | 177 |
|
98 | | -## Sponsors |
| 178 | +async def element_finding_examples(): |
| 179 | + async with Chrome() as browser: |
| 180 | + tab = await browser.start() |
| 181 | + await tab.go_to('https://example.com') |
| 182 | + |
| 183 | + # Find by attributes (most intuitive) |
| 184 | + submit_btn = await tab.find( |
| 185 | + tag_name='button', |
| 186 | + class_name='btn-primary', |
| 187 | + text='Submit' |
| 188 | + ) |
| 189 | + |
| 190 | + # Find by ID |
| 191 | + username_field = await tab.find(id='username') |
| 192 | + |
| 193 | + # Find multiple elements |
| 194 | + all_links = await tab.find(tag_name='a', find_all=True) |
| 195 | + |
| 196 | + # CSS selectors and XPath |
| 197 | + nav_menu = await tab.query('nav.main-menu') |
| 198 | + specific_item = await tab.query('//div[@data-testid="item-123"]') |
| 199 | + |
| 200 | + # With timeout and error handling |
| 201 | + delayed_element = await tab.find( |
| 202 | + class_name='dynamic-content', |
| 203 | + timeout=10, |
| 204 | + raise_exc=False # Returns None if not found |
| 205 | + ) |
| 206 | + |
| 207 | + # Advanced: Custom attributes |
| 208 | + custom_element = await tab.find( |
| 209 | + data_testid='submit-button', |
| 210 | + aria_label='Submit form' |
| 211 | + ) |
| 212 | + |
| 213 | +asyncio.run(element_finding_examples()) |
| 214 | +``` |
99 | 215 |
|
100 | | -[CapSolver](https://www.capsolver.com/?utm_source=github&utm_medium=banner_repo&utm_campaign=scraping&utm_term=pydoll) is an AI-powered tool that easily bypasses Captchas, allowing uninterrupted access to public data with fast, reliable, and cost-effective. And please enjoy the code PYDOLL to get an extra 6% balance! and register [here](https://dashboard.capsolver.com/passport/?utm_source=github&utm_medium=banner_repo&utm_campaign=scraping&utm_term=pydoll) |
| 216 | +### Concurrent Automation |
101 | 217 |
|
102 | | -<p align="left"> |
103 | | - <a href="https://www.capsolver.com/?utm_source=github&utm_medium=banner_repo&utm_campaign=scraping&utm_term=pydoll" target="_blank"> |
104 | | - <img src="https://github.com/user-attachments/assets/aaf49563-2b93-49c3-8f9c-c2dccc8dc0c8" alt="Pydoll Sponsors" width="1200" /> |
105 | | - </a> |
106 | | -</p> |
| 218 | +Leverage async capabilities for parallel processing: |
| 219 | + |
| 220 | +```python |
| 221 | +import asyncio |
| 222 | +from pydoll.browser import Chrome |
| 223 | + |
| 224 | +async def scrape_page(url): |
| 225 | + """Scrape a single page""" |
| 226 | + async with Chrome() as browser: |
| 227 | + tab = await browser.start() |
| 228 | + await tab.go_to(url) |
| 229 | + |
| 230 | + title = await tab.execute_script('return document.title') |
| 231 | + links = await tab.find(tag_name='a', find_all=True) |
| 232 | + |
| 233 | + return { |
| 234 | + 'url': url, |
| 235 | + 'title': title, |
| 236 | + 'link_count': len(links) |
| 237 | + } |
| 238 | + |
| 239 | +async def concurrent_scraping(): |
| 240 | + urls = [ |
| 241 | + 'https://example1.com', |
| 242 | + 'https://example2.com', |
| 243 | + 'https://example3.com' |
| 244 | + ] |
| 245 | + |
| 246 | + # Process all URLs concurrently |
| 247 | + tasks = [scrape_page(url) for url in urls] |
| 248 | + results = await asyncio.gather(*tasks) |
| 249 | + |
| 250 | + for result in results: |
| 251 | + print(f"{result['url']}: {result['title']} ({result['link_count']} links)") |
| 252 | + |
| 253 | +asyncio.run(concurrent_scraping()) |
| 254 | +``` |
| 255 | + |
| 256 | +### Event-Driven Automation |
| 257 | + |
| 258 | +React to page events and user interactions: |
| 259 | + |
| 260 | +```python |
| 261 | +import asyncio |
| 262 | +from pydoll.browser import Chrome |
| 263 | +from pydoll.protocol.page.events import PageEvent |
| 264 | + |
| 265 | +async def event_driven_automation(): |
| 266 | + async with Chrome() as browser: |
| 267 | + tab = await browser.start() |
| 268 | + |
| 269 | + # Enable page events |
| 270 | + await tab.enable_page_events() |
| 271 | + |
| 272 | + # React to page load |
| 273 | + async def on_page_load(event): |
| 274 | + print("Page loaded! Starting automation...") |
| 275 | + # Perform actions after page loads |
| 276 | + await tab.find(id='search-box').type('automation') |
| 277 | + |
| 278 | + # React to navigation |
| 279 | + async def on_navigation(event): |
| 280 | + url = event['params']['url'] |
| 281 | + print(f"Navigated to: {url}") |
| 282 | + |
| 283 | + await tab.on(PageEvent.LOAD_EVENT_FIRED, on_page_load) |
| 284 | + await tab.on(PageEvent.FRAME_NAVIGATED, on_navigation) |
| 285 | + |
| 286 | + await tab.go_to('https://example.com') |
| 287 | + await asyncio.sleep(5) # Let events process |
| 288 | + |
| 289 | +asyncio.run(event_driven_automation()) |
| 290 | +``` |
| 291 | + |
| 292 | +### Working with iFrames |
| 293 | + |
| 294 | +Pydoll provides seamless iframe interaction through the `get_frame()` method: |
107 | 295 |
|
108 | | -Pydoll is proudly supported by these amazing sponsors who believe in the future of webdriver-free automation. Their contributions make it possible for us to maintain and improve this project. |
109 | | -Interested in becoming a sponsor? Check out our [GitHub Sponsors page](https://github.com/sponsors/thalissonvs) for more information about the perks and benefits of sponsoring this project! |
| 296 | +```python |
| 297 | +import asyncio |
| 298 | +from pydoll.browser.chromium import Chrome |
| 299 | + |
| 300 | +async def iframe_interaction(): |
| 301 | + async with Chrome() as browser: |
| 302 | + tab = await browser.start() |
| 303 | + await tab.go_to('https://example.com/page-with-iframe') |
| 304 | + |
| 305 | + # Find the iframe element |
| 306 | + iframe_element = await tab.query('.hcaptcha-iframe', timeout=10) |
| 307 | + |
| 308 | + # Get a Tab instance for the iframe content |
| 309 | + frame = await tab.get_frame(iframe_element) |
| 310 | + |
| 311 | + # Now interact with elements inside the iframe |
| 312 | + submit_button = await frame.find(tag_name='button', class_name='submit') |
| 313 | + await submit_button.click() |
| 314 | + |
| 315 | + # You can use all Tab methods on the frame |
| 316 | + form_input = await frame.find(id='captcha-input') |
| 317 | + await form_input.type('verification-code') |
| 318 | + |
| 319 | + # Find elements by various methods |
| 320 | + links = await frame.find(tag_name='a', find_all=True) |
| 321 | + specific_element = await frame.query('#specific-id') |
| 322 | + |
| 323 | +asyncio.run(iframe_interaction()) |
| 324 | +``` |
| 325 | + |
| 326 | +## Documentation |
| 327 | + |
| 328 | +For comprehensive documentation, examples, and deep dives into Pydoll's features, visit our [official documentation site](https://autoscrape-labs.github.io/pydoll/). |
| 329 | + |
| 330 | +The documentation includes: |
| 331 | +- **Getting Started Guide** - Step-by-step tutorials |
| 332 | +- **API Reference** - Complete method documentation |
| 333 | +- **Advanced Techniques** - Network interception, event handling, performance optimization |
| 334 | +- **Migration Guide** - Upgrading from older versions |
| 335 | +- **Troubleshooting** - Common issues and solutions |
| 336 | +- **Best Practices** - Patterns for reliable automation |
110 | 337 |
|
111 | 338 | ## Contributing |
112 | 339 |
|
|
0 commit comments