autoscrape-labs
diff --git a/‎.github/workflows/tests.yml‎
Lines changed: 4 additions & 0 deletions b/‎.github/workflows/tests.yml‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 1 deletion b/‎.gitignore‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎CHANGELOG.md‎
Lines changed: 42 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 1 addition & 1 deletion b/‎LICENSE‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 1 addition & 39 deletions b/‎README.md‎
Lines changed: 1 addition & 39 deletions
diff --git a/‎README_zh.md‎
Lines changed: 40 additions & 1 deletion b/‎README_zh.md‎
Lines changed: 40 additions & 1 deletion
diff --git a/‎cz.yaml‎
Lines changed: 1 addition & 1 deletion b/‎cz.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎public/docs/api/commands/target.md‎
Lines changed: 3 additions & 0 deletions b/‎public/docs/api/commands/target.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎public/docs/deep-dive/browser-domain.md‎
Lines changed: 79 additions & 0 deletions b/‎public/docs/deep-dive/browser-domain.md‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎public/docs/deep-dive/tab-domain.md‎
Lines changed: 15 additions & 0 deletions b/‎public/docs/deep-dive/tab-domain.md‎
Lines changed: 15 additions & 0 deletions
@@ -22,6 +22,10 @@ jobs:
         run: |
           python -m pip install poetry
           poetry install
+      - name: Install Chrome
+        uses: browser-actions/setup-chrome@v1
+        with:
+          chrome-version: 132
       - name: Run tests with coverage
         run: |
           poetry run pytest -s -x --cov=pydoll -vv --cov-report=xml
 
@@ -161,4 +161,7 @@ cython_debug/
 #.idea/
 
 .czrc
-.ruff_cache/
+.ruff_cache/
+
+# Dev test file
+dev_test_file.py
@@ -1,3 +1,45 @@
+## 2.8.2 (2025-10-03)
+
+### Fix
+
+- implement proxy authentication handling for browser tabs
+- map exception when try to take screenshot of an iframe
+
+## 2.8.1 (2025-09-27)
+
+### Fix
+
+- store the opened tab in the _tabs_opened dictionary
+- **elements**: correctly detect parenthesized XPath expressions
+
+### Refactor
+
+- simplify FindElementsMixin._get_expression_type startswith checks into single tuple
+
+## 2.8.0 (2025-08-28)
+
+### Feat
+
+- adding get_siblings_elements method
+- adding get_children_elements method
+- refactor Tab class to support optional WebSocket address handling
+- add WebSocket connection support for existing browser instances
+- add optional WebSocket address support in connection handler
+
+### Fix
+
+- add get siblings and get childen methods a raise_exc option
+- improving children and parent retrive docstring and creating a private generic method for then
+- using new execute_script public method
+- solving conflicts
+- rename pages fixtures files and adding a error test
+
+### Refactor
+
+- refactor Tab class to improve initialization and error handling
+- refactor Browser class to manage opened tabs and WebSocket setup
+- add new exception classes for connection and WebSocket errors
+
 ## 2.7.0 (2025-08-22)
 
 ### Feat
 
@@ -1,6 +1,6 @@
 The MIT License (MIT)
 
-Copyright © 2025 <copyright holders>
+Copyright © 2025 AutoscrapeLabs
 
 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 
 
@@ -1,7 +1,7 @@
 <p align="center">
     <img src="https://github.com/user-attachments/assets/219f2dbc-37ed-4aea-a289-ba39cdbb335d" alt="Pydoll Logo" /> <br>
 </p>
-<h1 align="center">Pydoll: Automate the Web, Naturally</h1>
+<h1 align="center">Pydoll: scraping, the easier way</h1>
 
 <p align="center">
     <a href="https://github.com/autoscrape-labs/pydoll/stargazers"><img src="https://img.shields.io/github/stars/autoscrape-labs/pydoll?style=social"></a>
@@ -45,43 +45,6 @@ We believe that powerful automation shouldn't require you to become an expert in
 - **Humanized Interactions**: Mimic real user behavior
 - **Simplicity**: With Pydoll, you install and you're ready to automate.
 
-## What's New
-
-### WebElement: state waiting and new public APIs
-
-- New `wait_until(...)` on `WebElement` to await element states with minimal code:
-
-```python
-# Wait until it becomes visible OR the timeout expires
-await element.wait_until(is_visible=True, timeout=5)
-
-# Wait until it becomes interactable (visible, on top, receiving pointer events)
-await element.wait_until(is_interactable=True, timeout=10)
-```
-
-- Methods now public on `WebElement`:
-  - `is_visible()`
-    - Checks that the element has a visible area (> 0), isn’t hidden by CSS and is in the viewport (after `scroll_into_view()` when needed). Useful pre-check before interactions.
-  - `is_interactable()`
-    - “Click-ready” state: combines visibility, enabledness and pointer-event hit testing. Ideal for robust flows that avoid lost clicks.
-  - `is_on_top()`
-    - Verifies the element is the top hit-test target at the intended click point, avoiding overlays.
-  - `execute_script(script: str, return_by_value: bool = False)`
-    - Executes JavaScript in the element’s own context (where `this` is the element). Great for fine-tuning and quick inspections.
-
-```python
-# Visually outline the element via JS
-await element.execute_script("this.style.outline='2px solid #22d3ee'")
-
-# Confirm states
-visible = await element.is_visible()
-interactable = await element.is_interactable()
-on_top = await element.is_on_top()
-```
-
-These additions simplify waiting and state validation before clicking/typing, reducing flakiness and making automations more predictable.
-
-
 ## 📦 Installation
 
 ```bash
@@ -208,7 +171,6 @@ Pydoll offers a series of advanced features to please even the most
 demanding users.
 
 
-
 ### Advanced Element Search
 
 We have several ways to find elements on the page. No matter how you prefer, we have a way that makes sense for you:
 
@@ -49,6 +49,45 @@ Pydoll 采用全新设计理念，从零构建，直接对接 Chrome DevTools Pr
 
 ## 最新功能
 
+### 通过 WebSocket 进行远程连接 —— 随时随地控制浏览器！
+
+现在你可以使用浏览器的 WebSocket 地址直接连接到已运行的实例，并立即使用完整的 Pydoll API：
+
+```python
+from pydoll.browser.chromium import Chrome
+
+chrome = Chrome()
+tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
+
+# 直接开干：导航、元素自动化、请求、事件…
+await tab.go_to('https://example.com')
+title = await tab.execute_script('return document.title')
+print(title)
+```
+
+这让你可以轻松对接远程/CI 浏览器、容器或共享调试目标——无需本地启动，只需指向 WS 端点即可自动化。
+
+### 像专业人士一样漫游 DOM：get_children_elements() 与 get_siblings_elements()
+
+两个让复杂布局遍历更优雅的小助手：
+
+```python
+# 获取容器的直接子元素
+container = await tab.find(id='cards')
+cards = await container.get_children_elements(max_depth=1)
+
+# 想更深入？这将返回子元素的子元素（以此类推）
+elements = await container.get_children_elements(max_depth=2) 
+
+# 在横向列表中无痛遍历兄弟元素
+active = await tab.find(class_name='item--active')
+siblings = await active.get_siblings_elements()
+
+print(len(cards), len(siblings))
+```
+
+用更少样板代码表达更多意图，特别适合动态网格、列表与菜单的场景，让抓取/自动化逻辑更清晰、更可读。
+
 ### WebElement：状态等待与新的公共 API
 
 - 新增 `wait_until(...)` 用于等待元素状态，使用更简单：
@@ -212,7 +251,7 @@ options.browser_preferences = {
 
 这种控制级别以前只有 Chrome 扩展开发者才能使用 - 现在它在你的自动化工具包中！
 
-查看[文档](https://autoscrape-labs.github.io/pydoll/features/custom-browser-preferences/)了解更多详情。
+查看[文档](https://pydoll.tech/docs/zh/features/#custom-browser-preferences/)了解更多详情。
 
 ### 新的 `get_parent_element()` 方法
 检索任何 WebElement 的父元素，使导航 DOM 结构更加容易：
 
@@ -2,4 +2,4 @@
 commitizen:
   name: cz_conventional_commits
   tag_format: $version
-  version: 2.7.0
+  version: 2.8.2
@@ -100,6 +100,9 @@ incognito_tab = await create_target(
 )
 ```
 
+!!! info "Headless vs Headed: how contexts show up"
+    Browser contexts are isolated logical environments. In headed mode, the first page created inside a new context will usually open in a new OS window. In headless mode, no window is shown — the isolation remains purely logical (cookies, storage, cache and auth state are still separate per context). Prefer contexts in headless/CI pipelines for performance and clean isolation.
+
 ## Advanced Features
 
 ### Target Events
 
@@ -327,6 +327,85 @@ Browser contexts are essential for several automation scenarios:
 4. **Session Isolation**: Prevent cross-contamination between test scenarios
 5. **Parallel Scraping**: Scrape multiple sites with different configurations
 
+### Headless vs Headed: Windows and Best Practices
+
+Browser contexts are a logical isolation layer. What you actually see is the page created inside a context:
+
+- In headed mode (visible UI), creating the first page inside a new browser context will typically open a new OS window. The context is the isolated environment; the page is what renders in a tab or window.
+- In headless mode (no visible UI), no windows appear. The isolation still exists logically in the background, keeping cookies, storage, cache and auth state fully separate per context.
+
+Recommendations:
+
+- Prefer using multiple contexts in headless environments (e.g., CI/CD) for cleaner isolation, faster startup, and lower resource usage compared to launching multiple browser processes.
+- Use contexts to simulate multiple users or sessions in parallel without cross-contamination.
+
+Why contexts are efficient:
+
+- Creating a new browser context is significantly faster and lighter than starting a whole new browser instance. This makes test suites and scraping jobs more reliable and scalable.
+
+### CDP Hierarchy and Context Window Semantics (Advanced)
+
+To reason precisely about contexts, it's useful to map Pydoll concepts to CDP:
+
+- Browser (process): single Chromium process running the DevTools endpoint.
+- BrowserContext: isolated profile inside that process (cookies, storage, cache, permissions).
+- Target/Page: an individual top-level page, popup, or background target that you control.
+
+CDP and `browserContextId`:
+
+- When creating a page via `Target.createTarget`, passing `browserContextId` tells the browser which isolated profile the new page should belong to. Without this ID, the target is created in the default context.
+- The ID is essential for isolation — it binds the new target to the correct storage/auth/permission boundary.
+
+Why the first page in a context opens a window (headed):
+
+- In headed mode, a page needs a top-level native window to render. A freshly created context initially has no window associated with it — it exists only in memory.
+- The first page created in that context implicitly materializes a window for that context. Subsequent pages can open as tabs within that window.
+
+Implications for `new_window`/`newWindow` semantics:
+
+- If you attempt to create a page with "tab-like" behavior (no new top-level window) in a context that has no existing window (first page), the browser may error because there is no host window to attach the tab to.
+- Practically: treat the first page in a new context (headed) as requiring a top-level window. Afterwards, you can create additional pages as tabs.
+
+Headless mode makes this distinction moot:
+
+- With no visible UI, windows vs tabs are logical constructs only. Context isolation is enforced the same way, but nothing is rendered, so there is no requirement to bootstrap a native window for the first page.
+
+### Context-specific Proxy: sanitize + auth via Fetch events
+
+When creating a browser context with a private proxy (credentials embedded in the URL), Pydoll follows a two-step strategy to avoid leaking credentials and reliably authenticate:
+
+1) Sanitize the proxy server in the CDP command
+
+- If you pass `proxy_server='http://user:pass@host:port'`, only the credential-free URL is sent to CDP (`http://host:port`).
+- Internally, Pydoll extracts and stores the credentials keyed by `browserContextId`.
+
+2) Attach per-context auth handlers on first tab
+
+- When you open a `Tab` inside that context, Pydoll enables Fetch events for that tab and registers two temporary listeners:
+  - `Fetch.requestPaused`: continues normal requests.
+  - `Fetch.authRequired`: automatically responds with the stored `user`/`pass`, then disables Fetch to avoid intercepting further requests.
+
+Why this design?
+
+- Prevents credential exposure in command logs and CDP parameters.
+- Keeps the auth scope strictly limited to the context that requested the proxy.
+- Works in both headed and headless modes (the auth flow is network-level, not UI-dependent).
+
+Code flow highlights (simplified):
+
+```python
+# On context creation
+context_id = await browser.create_browser_context(proxy_server='user:pwd@host:port')
+# => sends Target.createBrowserContext with 'http://host:port'
+# => stores {'context_id': ('user', 'pwd')} internally
+
+# On first tab in that context
+tab = await browser.new_tab(browser_context_id=context_id)
+# => tab.enable_fetch_events(handle_auth=True)
+# => tab.on('Fetch.requestPaused', continue_request)
+# => tab.on('Fetch.authRequired', continue_with_auth(user, pwd))
+```
+
 ### Creating and Managing Contexts
 
 ```python
 
@@ -368,6 +368,21 @@ These visual capture capabilities are invaluable for:
 - Debugging automation scripts
 - Archiving page content
 
+!!! warning "Top-level targets vs iFrames for Tab screenshots"
+    `Tab.take_screenshot()` relies on CDP's `Page.captureScreenshot`, which only works for top-level targets. If you obtained a `Tab` for an iframe using `await tab.get_frame(iframe_element)`, calling `take_screenshot()` on that iframe tab will raise `TopLevelTargetRequired`.
+    
+    Use `WebElement.take_screenshot()` inside iframes. It captures via the viewport and works within the iframe context.
+    
+    ```python
+    # Wrong: iframe Tab screenshot (raises TopLevelTargetRequired)
+    iframe_tab = await tab.get_frame(iframe_element)
+    await iframe_tab.take_screenshot(as_base64=True)  # will raise an exception
+
+    # Correct: element screenshot inside iframe (uses viewport)
+    element = await iframe_tab.find(id='captcha')
+    await element.take_screenshot('captcha.png')  # will work!
+    ```
+
 ## Event System Overview
 
 The Tab domain provides a comprehensive event system for monitoring and reacting to browser events:
Original file line number	Diff line number	Diff line change
`@@ -100,6 +100,9 @@ incognito_tab = await create_target(`
`100`	`100`	`)`
`101`	`101`	```
`102`	`102`
	`103`	`+!!! info "Headless vs Headed: how contexts show up"`
	`104`	`+ Browser contexts are isolated logical environments. In headed mode, the first page created inside a new context will usually open in a new OS window. In headless mode, no window is shown — the isolation remains purely logical (cookies, storage, cache and auth state are still separate per context). Prefer contexts in headless/CI pipelines for performance and clean isolation.`
	`105`	`+`
`103`	`106`	`## Advanced Features`
`104`	`107`
`105`	`108`	`### Target Events`