autoscrape-labs · thalissonvs · Mar 22, 2026 · Mar 22, 2026 · Mar 22, 2026 · Mar 22, 2026
diff --git a/README.md b/README.md
@@ -22,9 +22,9 @@
     <a href="#support">Support</a>
 </p>
 
-Pydoll automates Chromium-based browsers (Chrome, Edge) by connecting directly to the Chrome DevTools Protocol over WebSocket. No WebDriver binary, no `navigator.webdriver` flag, no compatibility issues.
+Pydoll automates Chromium-based browsers (Chrome, Edge) by connecting directly to the Chrome DevTools Protocol over WebSocket. **No WebDriver binary, no `navigator.webdriver` flag, no compatibility issues.**
 
-It combines a high-level API for common tasks with low-level CDP access for fine-grained control over network, fingerprinting, and browser behavior. The entire codebase is async-native and fully type-checked with mypy.
+It combines a high-level API for stealthy automation with low-level CDP access for fine-grained control over network, fingerprinting, and browser behavior. And with its new **Pydantic-powered extraction engine**, it maps the DOM directly to structured Python objects, delivering an unmatched Developer Experience (DX).
 
 ### Top Sponsors
 
@@ -48,11 +48,11 @@ It combines a high-level API for common tasks with low-level CDP access for fine
 
 ### Why Pydoll
 
-- **Stealth-first**: Human-like mouse movement, realistic typing, and granular [browser preference](https://pydoll.tech/docs/features/configuration/browser-preferences/) control for fingerprint management.
+- **Structured extraction**: Define a [Pydantic](https://docs.pydantic.dev/) model, call `tab.extract()`, get typed and validated data back. No manual element-by-element querying.
 - **Async and typed**: Built on `asyncio` from the ground up, 100% type-checked with `mypy`. Full IDE autocompletion and static error checking.
+- **Stealth built in**: Human-like mouse movement, realistic typing, and granular [browser preference](https://pydoll.tech/docs/features/configuration/browser-preferences/) control for fingerprint management.
 - **Network control**: [Intercept](https://pydoll.tech/docs/features/network/interception/) requests to block ads/trackers, [monitor](https://pydoll.tech/docs/features/network/monitoring/) traffic for API discovery, and make [authenticated HTTP requests](https://pydoll.tech/docs/features/network/http-requests/) that inherit the browser session.
 - **Shadow DOM and iframes**: Full support for [shadow roots](https://pydoll.tech/docs/deep-dive/architecture/shadow-dom/) (including closed) and cross-origin iframes. Discover, query, and interact with elements inside them using the same API.
-- **Ergonomic API**: `tab.find()` for most cases, `tab.query()` for complex [CSS/XPath selectors](https://pydoll.tech/docs/deep-dive/guides/selectors-guide/).
 
 ## Installation
 
@@ -62,55 +62,124 @@ pip install pydoll-python
 
 No WebDriver binaries or external dependencies required.
 
-## What's New
+## Getting Started
 
-<details>
-<summary><b>HAR Network Recording</b></summary>
-<br>
+### 1. Stateful Automation & Evasion
 
-Record network activity during a browser session and export as HAR 1.2. Replay recorded requests to reproduce exact API sequences.
+When you need to navigate, bypass challenges, or interact with dynamic UI, Pydoll's imperative API handles it with humanized timing by default.
 
 ```python
-from pydoll.browser.chromium import Chrome
+import asyncio
+from pydoll.browser import Chrome
+from pydoll.constants import Key
 
-async with Chrome() as browser:
-    tab = await browser.start()
+async def google_search(query: str):
+    async with Chrome() as browser:
+        tab = await browser.start()
+        await tab.go_to('https://www.google.com')
 
-    async with tab.request.record() as capture:
-        await tab.go_to('https://example.com')
+        # Find elements and interact with human-like timing
+        search_box = await tab.find(tag_name='textarea', name='q')
+        await search_box.insert_text(query)
+        await tab.keyboard.press(Key.ENTER)
 
-    capture.save('flow.har')
-    print(f'Captured {len(capture.entries)} requests')
+        first_result = await tab.find(
+            tag_name='h3',
+            text='autoscrape-labs/pydoll',
+            timeout=10,
+        )
+        await first_result.click()
+        print(f"Page loaded: {await tab.title}")
 
-    responses = await tab.request.replay('flow.har')
+asyncio.run(google_search('pydoll site:github.com'))
 ```
 
-Filter by resource type:
+### 2. Structured Data Extraction
+
+Once you reach the target page, switch to the declarative engine. Define what you want with a model, and Pydoll extracts it — typed, validated, and ready to use.
 
 ```python
-from pydoll.protocol.network.types import ResourceType
+from pydoll.browser.chromium import Chrome
+from pydoll.extractor import ExtractionModel, Field
+
+class Quote(ExtractionModel):
+    text: str = Field(selector='.text', description='The quote text')
+    author: str = Field(selector='.author', description='Who said it')
+    tags: list[str] = Field(selector='.tag', description='Tags')
+    year: int | None = Field(selector='.year', description='Year', default=None)
 
-async with tab.request.record(
-    resource_types=[ResourceType.FETCH, ResourceType.XHR]
-) as capture:
-    await tab.go_to('https://example.com')
+async def extract_quotes():
+    async with Chrome() as browser:
+        tab = await browser.start()
+        await tab.go_to('https://quotes.toscrape.com')
+
+        quotes = await tab.extract_all(Quote, scope='.quote', timeout=5)
+
+        for q in quotes:
+            print(f'{q.author}: {q.text}')  # fully typed, IDE autocomplete works
+            print(q.tags)                    # list[str], not a raw element
+            print(q.model_dump_json())       # pydantic serialization built-in
+
+asyncio.run(extract_quotes())
 ```
 
-[HAR Recording Docs](https://pydoll.tech/docs/features/network/network-recording/)
+Models support CSS/XPath auto-detection, HTML attribute targeting, custom transforms, and nested models.
+
+<details>
+<summary><b>Nested models, transforms, and attribute extraction</b></summary>
+<br>
+
+```python
+from datetime import datetime
+from pydoll.extractor import ExtractionModel, Field
+
+def parse_date(raw: str) -> datetime:
+    return datetime.strptime(raw.strip(), '%B %d, %Y')
+
+class Author(ExtractionModel):
+    name: str = Field(selector='.author-title')
+    born: datetime = Field(
+        selector='.author-born-date',
+        transform=parse_date,
+    )
+
+class Article(ExtractionModel):
+    title: str = Field(selector='h1')
+    url: str = Field(selector='.source-link', attribute='href')
+    author: Author = Field(selector='.author-card', description='Nested model')
+
+article = await tab.extract(Article, timeout=5)
+article.author.born.year  # int — types are preserved all the way down
+```
 </details>
 
+## Features
+
 <details>
-<summary><b>Page Bundles</b></summary>
+<summary><b>Humanized Mouse Movement</b></summary>
 <br>
 
-Save the current page and all its assets (CSS, JS, images, fonts) as a `.zip` bundle for offline viewing. Optionally inline everything into a single HTML file.
+Mouse operations produce human-like cursor movement by default:
+
+- **Bezier curve paths** with asymmetric control points
+- **Fitts's Law timing**: duration scales with distance
+- **Minimum-jerk velocity**: bell-shaped speed profile
+- **Physiological tremor**: Gaussian noise scaled with velocity
+- **Overshoot correction**: ~70% chance on fast movements, then corrects back
 
 ```python
-await tab.save_bundle('page.zip')
-await tab.save_bundle('page-inline.zip', inline_assets=True)
+await tab.mouse.move(500, 300)
+await tab.mouse.click(500, 300)
+await tab.mouse.drag(100, 200, 500, 400)
+
+button = await tab.find(id='submit')
+await button.click()
+
+# Opt out when speed matters
+await tab.mouse.click(500, 300, humanize=False)
 ```
 
-[Screenshots, PDFs & Bundles Docs](https://pydoll.tech/docs/features/automation/screenshots-and-pdfs/)
+[Mouse Control Docs](https://pydoll.tech/docs/features/automation/mouse-control/)
 </details>
 
 <details>
@@ -139,75 +208,46 @@ Highlights:
 - `deep=True` traverses cross-origin iframes (OOPIFs)
 - Standard `find()`, `query()`, `click()` API inside shadow roots
 
-```python
-# Cloudflare Turnstile inside a cross-origin iframe
-shadow_roots = await tab.find_shadow_roots(deep=True, timeout=10)
-for sr in shadow_roots:
-    checkbox = await sr.query('input[type="checkbox"]', raise_exc=False)
-    if checkbox:
-        await checkbox.click()
-```
-
 [Shadow DOM Docs](https://pydoll.tech/docs/deep-dive/architecture/shadow-dom/)
 </details>
 
 <details>
-<summary><b>Humanized Mouse Movement</b></summary>
+<summary><b>HAR Network Recording</b></summary>
 <br>
 
-Mouse operations produce human-like cursor movement by default:
-
-- **Bezier curve paths** with asymmetric control points
-- **Fitts's Law timing**: duration scales with distance
-- **Minimum-jerk velocity**: bell-shaped speed profile
-- **Physiological tremor**: Gaussian noise scaled with velocity
-- **Overshoot correction**: ~70% chance on fast movements, then corrects back
+Record network activity during a browser session and export as HAR 1.2. Replay recorded requests to reproduce exact API sequences.
 
 ```python
-await tab.mouse.move(500, 300)
-await tab.mouse.click(500, 300)
-await tab.mouse.drag(100, 200, 500, 400)
-
-button = await tab.find(id='submit')
-await button.click()
-
-# Opt out when speed matters
-await tab.mouse.click(500, 300, humanize=False)
-```
+from pydoll.browser.chromium import Chrome
 
-[Mouse Control Docs](https://pydoll.tech/docs/features/automation/mouse-control/)
-</details>
+async with Chrome() as browser:
+    tab = await browser.start()
 
-## Getting Started
+    async with tab.request.record() as capture:
+        await tab.go_to('https://example.com')
 
-```python
-import asyncio
-from pydoll.browser import Chrome
-from pydoll.constants import Key
+    capture.save('flow.har')
+    print(f'Captured {len(capture.entries)} requests')
 
-async def google_search(query: str):
-    async with Chrome() as browser:
-        tab = await browser.start()
-        await tab.go_to('https://www.google.com')
+    responses = await tab.request.replay('flow.har')
+```
 
-        search_box = await tab.find(tag_name='textarea', name='q')
-        await search_box.insert_text(query)
-        await tab.keyboard.press(Key.ENTER)
+[HAR Recording Docs](https://pydoll.tech/docs/features/network/network-recording/)
+</details>
 
-        first_result = await tab.find(
-            tag_name='h3',
-            text='autoscrape-labs/pydoll',
-            timeout=10,
-        )
-        await first_result.click()
+<details>
+<summary><b>Page Bundles</b></summary>
+<br>
 
-        await tab.find(id='repository-container-header', timeout=10)
-        print(f"Page loaded: {await tab.title}")
+Save the current page and all its assets (CSS, JS, images, fonts) as a `.zip` bundle for offline viewing. Optionally inline everything into a single HTML file.
 
-asyncio.run(google_search('pydoll site:github.com'))
+```python
+await tab.save_bundle('page.zip')
+await tab.save_bundle('page-inline.zip', inline_assets=True)
 ```
 
-## Features
+[Screenshots, PDFs & Bundles Docs](https://pydoll.tech/docs/features/automation/screenshots-and-pdfs/)
+</details>
 
 <details>
 <summary><b>Hybrid Automation (UI + API)</b></summary>