Skip to content

Commit 06a8d67

Browse files
committed
2 parents 472aecd + e1e9f3a commit 06a8d67

22 files changed

+615
-372
lines changed

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,27 @@
1+
## 2.8.0 (2025-08-28)
2+
3+
### Feat
4+
5+
- adding get_siblings_elements method
6+
- adding get_children_elements method
7+
- refactor Tab class to support optional WebSocket address handling
8+
- add WebSocket connection support for existing browser instances
9+
- add optional WebSocket address support in connection handler
10+
11+
### Fix
12+
13+
- add get siblings and get childen methods a raise_exc option
14+
- improving children and parent retrive docstring and creating a private generic method for then
15+
- using new execute_script public method
16+
- solving conflicts
17+
- rename pages fixtures files and adding a error test
18+
19+
### Refactor
20+
21+
- refactor Tab class to improve initialization and error handling
22+
- refactor Browser class to manage opened tabs and WebSocket setup
23+
- add new exception classes for connection and WebSocket errors
24+
125
## 2.7.0 (2025-08-22)
226

327
### Feat

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The MIT License (MIT)
22

3-
Copyright © 2025 <copyright holders>
3+
Copyright © 2025 AutoscrapeLabs
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
66

README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,45 @@ We believe that powerful automation shouldn't require you to become an expert in
4747

4848
## What's New
4949

50+
### Remote connections via WebSocket — control any Chrome from anywhere!
51+
52+
You asked for it, we delivered. You can now connect to an already running browser remotely via its WebSocket address and use the full Pydoll API immediately.
53+
54+
```python
55+
from pydoll.browser.chromium import Chrome
56+
57+
chrome = Chrome()
58+
tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
59+
60+
# Full power unlocked: navigation, element automation, requests, events…
61+
await tab.go_to('https://example.com')
62+
title = await tab.execute_script('return document.title')
63+
print(title)
64+
```
65+
66+
This makes it effortless to run Pydoll against remote/CI browsers, containers, or shared debugging targets — no local launch required. Just point to the WS endpoint and automate.
67+
68+
### Navigate the DOM like a pro: get_children_elements() and get_siblings_elements()
69+
70+
Two delightful helpers to traverse complex layouts with intention:
71+
72+
```python
73+
# Grab direct children of a container
74+
container = await tab.find(id='cards')
75+
cards = await container.get_children_elements(max_depth=1)
76+
77+
# Want to go deeper? This will return children of children (and so on)
78+
elements = await container.get_children_elements(max_depth=2)
79+
80+
# Walk horizontal lists without re-querying the DOM
81+
active = await tab.find(class_name='item-active')
82+
siblings = await active.get_siblings_elements()
83+
84+
print(len(cards), len(siblings))
85+
```
86+
87+
Use them to cut boilerplate, express intent, and keep your scraping/automation logic clean and readable — especially in dynamic grids, lists and menus.
88+
5089
### WebElement: state waiting and new public APIs
5190

5291
- New `wait_until(...)` on `WebElement` to await element states with minimal code:

README_zh.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,45 @@ Pydoll 采用全新设计理念,从零构建,直接对接 Chrome DevTools Pr
4949

5050
## 最新功能
5151

52+
### 通过 WebSocket 进行远程连接 —— 随时随地控制浏览器!
53+
54+
现在你可以使用浏览器的 WebSocket 地址直接连接到已运行的实例,并立即使用完整的 Pydoll API:
55+
56+
```python
57+
from pydoll.browser.chromium import Chrome
58+
59+
chrome = Chrome()
60+
tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
61+
62+
# 直接开干:导航、元素自动化、请求、事件…
63+
await tab.go_to('https://example.com')
64+
title = await tab.execute_script('return document.title')
65+
print(title)
66+
```
67+
68+
这让你可以轻松对接远程/CI 浏览器、容器或共享调试目标——无需本地启动,只需指向 WS 端点即可自动化。
69+
70+
### 像专业人士一样漫游 DOM:get_children_elements() 与 get_siblings_elements()
71+
72+
两个让复杂布局遍历更优雅的小助手:
73+
74+
```python
75+
# 获取容器的直接子元素
76+
container = await tab.find(id='cards')
77+
cards = await container.get_children_elements(max_depth=1)
78+
79+
# 想更深入?这将返回子元素的子元素(以此类推)
80+
elements = await container.get_children_elements(max_depth=2)
81+
82+
# 在横向列表中无痛遍历兄弟元素
83+
active = await tab.find(class_name='item--active')
84+
siblings = await active.get_siblings_elements()
85+
86+
print(len(cards), len(siblings))
87+
```
88+
89+
用更少样板代码表达更多意图,特别适合动态网格、列表与菜单的场景,让抓取/自动化逻辑更清晰、更可读。
90+
5291
### WebElement:状态等待与新的公共 API
5392

5493
- 新增 `wait_until(...)` 用于等待元素状态,使用更简单:

cz.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
commitizen:
33
name: cz_conventional_commits
44
tag_format: $version
5-
version: 2.7.0
5+
version: 2.8.0

public/docs/features.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,53 @@ Capture visual content from web pages:
5757
- **High-Quality PDF Export**: Generate PDF documents from web pages
5858
- **Custom Formatting**: Coming soon!
5959

60+
## Remote Connections and Hybrid Automation
61+
62+
### Connect to a running browser via WebSocket
63+
64+
Control an already running browser remotely by pointing Pydoll to its DevTools WebSocket address.
65+
66+
```python
67+
import asyncio
68+
from pydoll.browser.chromium import Chrome
69+
70+
async def main():
71+
chrome = Chrome()
72+
tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
73+
74+
await tab.go_to('https://example.com')
75+
title = await tab.execute_script('return document.title')
76+
print(title)
77+
78+
asyncio.run(main())
79+
```
80+
81+
Perfect for CI, containers, remote hosts, or shared debugging targets—no local launch required. Just provide the WS endpoint and automate.
82+
83+
### Bring your own CDP: wrap existing sessions with Pydoll objects
84+
85+
If you already have your own CDP integration, you can still leverage Pydoll’s high-level API by wiring it to an existing DevTools session. As long as you know an element’s `objectId`, you can create a `WebElement` directly:
86+
87+
```python
88+
from pydoll.connection import ConnectionHandler
89+
from pydoll.elements.web_element import WebElement
90+
91+
# Your DevTools WebSocket endpoint and an element objectId you resolved via CDP
92+
ws = 'ws://YOUR_HOST:9222/devtools/page/ABCDEF...'
93+
object_id = 'REMOTE_ELEMENT_OBJECT_ID'
94+
95+
connection_handler = ConnectionHandler(ws_address=ws)
96+
element = WebElement(object_id=object_id, connection_handler=connection_handler)
97+
98+
# Use the full WebElement API immediately
99+
visible = await element.is_visible()
100+
await element.wait_until(is_interactable=True, timeout=10)
101+
await element.click()
102+
text = await element.text
103+
```
104+
105+
This hybrid approach lets you blend your low-level CDP tooling (for discovery, instrumentation, or custom flows) with Pydoll’s ergonomic element API.
106+
60107
## Intuitive Element Finding
61108

62109
Pydoll v2.0+ introduces a revolutionary approach to finding elements that's both more intuitive and more powerful than traditional selector-based methods.
@@ -137,6 +184,44 @@ async def query_examples():
137184
asyncio.run(query_examples())
138185
```
139186

187+
### DOM Traversal Helpers: get_children_elements() and get_siblings_elements()
188+
189+
These helpers let you traverse the DOM tree from a known anchor, preserving scope and intent.
190+
191+
- get_children_elements(max_depth: int = 1, tag_filter: list[str] | None = None, raise_exc: bool = False) -> list[WebElement]
192+
- Returns descendants up to max_depth using pre-order traversal (direct children first, then their descendants)
193+
- max_depth=1 returns only direct children; 2 includes grandchildren, and so on
194+
- tag_filter restricts results to specific tags (use lowercase names, e.g. ['a', 'li'])
195+
- raise_exc=True raises ElementNotFound if the underlying script fails to resolve
196+
197+
- get_siblings_elements(tag_filter: list[str] | None = None, raise_exc: bool = False) -> list[WebElement]
198+
- Returns elements sharing the same parent, excluding the current element
199+
- tag_filter narrows by tag; order follows the parent’s child order
200+
201+
```python
202+
# Direct children in document order
203+
container = await tab.find(id='cards')
204+
children = await container.get_children_elements(max_depth=1)
205+
206+
# Include grandchildren
207+
descendants = await container.get_children_elements(max_depth=2)
208+
209+
# Filter by tag
210+
links = await container.get_children_elements(max_depth=4, tag_filter=['a'])
211+
212+
# Horizontal traversal
213+
active = await tab.find(class_name='item-active')
214+
siblings = await active.get_siblings_elements()
215+
link_siblings = await active.get_siblings_elements(tag_filter=['a'])
216+
```
217+
218+
Performance and correctness notes:
219+
220+
- DOM is a tree: breadth expands quickly with depth. Prefer small max_depth values and apply tag_filter to minimize work.
221+
- Ordering: children follow document order; siblings follow the parent’s order for stable iteration.
222+
- iFrames: each iframe has its own tree. Use `tab.get_frame(iframe_element)` to traverse inside the frame, then call these helpers there.
223+
- Large documents: deep traversals can touch many nodes. Combine shallow traversal with targeted `find()`/`query()` on subtree anchors for best performance.
224+
140225
## Native Cloudflare Captcha Bypass
141226

142227
!!! warning "Important Information About Captcha Bypass"

public/docs/zh/features.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,53 @@ Pydoll支持操作任何Chromium核心的浏览器:
5959
- **高质量 PDF 导出**:从网页生成 PDF 文档
6060
- **自定义格式**:即将推出!
6161

62+
## 远程连接与混合自动化
63+
64+
### 通过 WebSocket 连接已运行的浏览器
65+
66+
只需提供 DevTools 的 WebSocket 地址,即可远程控制已经在运行的浏览器实例:
67+
68+
```python
69+
import asyncio
70+
from pydoll.browser.chromium import Chrome
71+
72+
async def main():
73+
chrome = Chrome()
74+
tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
75+
76+
await tab.go_to('https://example.com')
77+
title = await tab.execute_script('return document.title')
78+
print(title)
79+
80+
asyncio.run(main())
81+
```
82+
83+
非常适合 CI、容器、远程主机或共享调试目标——无需本地启动,只需指向 WS 端点即可自动化。
84+
85+
### 自带 CDP:用 Pydoll 封装已有会话
86+
87+
如果你已经有自己的 CDP 集成,也可以将其与 Pydoll 的高级 API 结合使用。只要你知道元素的 `objectId`,就能直接构造 `WebElement`
88+
89+
```python
90+
from pydoll.connection import ConnectionHandler
91+
from pydoll.elements.web_element import WebElement
92+
93+
# 你的 DevTools WebSocket 地址,以及通过 CDP 获取到的元素 objectId
94+
ws = 'ws://YOUR_HOST:9222/devtools/page/ABCDEF...'
95+
object_id = 'REMOTE_ELEMENT_OBJECT_ID'
96+
97+
connection_handler = ConnectionHandler(ws_address=ws)
98+
element = WebElement(object_id=object_id, connection_handler=connection_handler)
99+
100+
# 立刻使用完整的 WebElement API
101+
visible = await element.is_visible()
102+
await element.wait_until(is_interactable=True, timeout=10)
103+
await element.click()
104+
text = await element.text
105+
```
106+
107+
这种混合模式让你可以将底层的 CDP 能力(用于发现、注入或自定义流程)与 Pydoll 更易用的元素 API 顺畅结合。
108+
62109
## 直观的元素查找
63110

64111
Pydoll v2.0+ 引入了一种革命性的元素查找方法,比传统的基于选择器的方法更直观、更强大。
@@ -140,6 +187,44 @@ async def query_examples():
140187
asyncio.run(query_examples())
141188
```
142189

190+
### DOM 遍历助手:get_children_elements() 与 get_siblings_elements()
191+
192+
从已知锚点按树形结构遍历 DOM,更加明确且安全:
193+
194+
- get_children_elements(max_depth: int = 1, tag_filter: list[str] | None = None, raise_exc: bool = False) -> list[WebElement]
195+
- 使用先序遍历返回后代元素(先直接子元素,再其后代),深度不超过 max_depth
196+
- max_depth=1 仅返回直接子元素;2 包含孙辈元素,以此类推
197+
- tag_filter 用于按标签名过滤(小写,如 ['a', 'li']
198+
- 当 raise_exc=True 且脚本解析失败时会抛出 ElementNotFound
199+
200+
- get_siblings_elements(tag_filter: list[str] | None = None, raise_exc: bool = False) -> list[WebElement]
201+
- 返回与当前元素同一父节点下的兄弟元素(不包含当前元素)
202+
- tag_filter 可按标签名过滤;返回顺序与父节点的子元素顺序一致
203+
204+
```python
205+
# 文档顺序的直接子元素
206+
container = await tab.find(id='cards')
207+
children = await container.get_children_elements(max_depth=1)
208+
209+
# 包含孙辈
210+
descendants = await container.get_children_elements(max_depth=2)
211+
212+
# 按标签过滤
213+
links = await container.get_children_elements(max_depth=4, tag_filter=['a'])
214+
215+
# 横向遍历
216+
active = await tab.find(class_name='item-active')
217+
siblings = await active.get_siblings_elements()
218+
link_siblings = await active.get_siblings_elements(tag_filter=['a'])
219+
```
220+
221+
性能与正确性提示:
222+
223+
- DOM 是树结构:深度增加会迅速扩展宽度。优先使用较小的 max_depth,并结合 tag_filter 限制范围。
224+
- 顺序:子元素遵循文档顺序;兄弟元素遵循父节点的子元素顺序,便于稳定迭代。
225+
- iFrame:每个 iframe 是独立的 DOM 树。使用 `tab.get_frame(iframe_element)` 进入后,再在该 frame 内调用这些助手。
226+
- 大型文档:深层遍历可能访问大量节点。建议将浅层遍历与基于锚点的精确 `find()`/`query()` 结合,以获得更佳性能。
227+
143228
## 原生 Cloudflare 验证码绕过
144229

145230
!!! 警告“关于验证码绕过的重要信息”

0 commit comments

Comments
 (0)