Skip to content

Commit b0c9a17

Browse files
committed
fix: map exception when try to take screenshot of an iframe
1 parent 375a0d2 commit b0c9a17

26 files changed

+1455
-564
lines changed

.github/workflows/tests.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,10 @@ jobs:
2222
run: |
2323
python -m pip install poetry
2424
poetry install
25+
- name: Install Chrome
26+
uses: browser-actions/setup-chrome@v1
27+
with:
28+
chrome-version: 132
2529
- name: Run tests with coverage
2630
run: |
2731
poetry run pytest -s -x --cov=pydoll -vv --cov-report=xml

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,4 +161,7 @@ cython_debug/
161161
#.idea/
162162

163163
.czrc
164-
.ruff_cache/
164+
.ruff_cache/
165+
166+
# Dev test file
167+
dev_test_file.py

CHANGELOG.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,38 @@
1+
## 2.8.1 (2025-09-27)
2+
3+
### Fix
4+
5+
- store the opened tab in the _tabs_opened dictionary
6+
- **elements**: correctly detect parenthesized XPath expressions
7+
8+
### Refactor
9+
10+
- simplify FindElementsMixin._get_expression_type startswith checks into single tuple
11+
12+
## 2.8.0 (2025-08-28)
13+
14+
### Feat
15+
16+
- adding get_siblings_elements method
17+
- adding get_children_elements method
18+
- refactor Tab class to support optional WebSocket address handling
19+
- add WebSocket connection support for existing browser instances
20+
- add optional WebSocket address support in connection handler
21+
22+
### Fix
23+
24+
- add get siblings and get childen methods a raise_exc option
25+
- improving children and parent retrive docstring and creating a private generic method for then
26+
- using new execute_script public method
27+
- solving conflicts
28+
- rename pages fixtures files and adding a error test
29+
30+
### Refactor
31+
32+
- refactor Tab class to improve initialization and error handling
33+
- refactor Browser class to manage opened tabs and WebSocket setup
34+
- add new exception classes for connection and WebSocket errors
35+
136
## 2.7.0 (2025-08-22)
237

338
### Feat

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
The MIT License (MIT)
22

3-
Copyright © 2025 <copyright holders>
3+
Copyright © 2025 AutoscrapeLabs
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
66

README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,45 @@ We believe that powerful automation shouldn't require you to become an expert in
4747

4848
## What's New
4949

50+
### Remote connections via WebSocket — control any Chrome from anywhere!
51+
52+
You asked for it, we delivered. You can now connect to an already running browser remotely via its WebSocket address and use the full Pydoll API immediately.
53+
54+
```python
55+
from pydoll.browser.chromium import Chrome
56+
57+
chrome = Chrome()
58+
tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
59+
60+
# Full power unlocked: navigation, element automation, requests, events…
61+
await tab.go_to('https://example.com')
62+
title = await tab.execute_script('return document.title')
63+
print(title)
64+
```
65+
66+
This makes it effortless to run Pydoll against remote/CI browsers, containers, or shared debugging targets — no local launch required. Just point to the WS endpoint and automate.
67+
68+
### Navigate the DOM like a pro: get_children_elements() and get_siblings_elements()
69+
70+
Two delightful helpers to traverse complex layouts with intention:
71+
72+
```python
73+
# Grab direct children of a container
74+
container = await tab.find(id='cards')
75+
cards = await container.get_children_elements(max_depth=1)
76+
77+
# Want to go deeper? This will return children of children (and so on)
78+
elements = await container.get_children_elements(max_depth=2)
79+
80+
# Walk horizontal lists without re-querying the DOM
81+
active = await tab.find(class_name='item-active')
82+
siblings = await active.get_siblings_elements()
83+
84+
print(len(cards), len(siblings))
85+
```
86+
87+
Use them to cut boilerplate, express intent, and keep your scraping/automation logic clean and readable — especially in dynamic grids, lists and menus.
88+
5089
### WebElement: state waiting and new public APIs
5190

5291
- New `wait_until(...)` on `WebElement` to await element states with minimal code:

README_zh.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,45 @@ Pydoll 采用全新设计理念,从零构建,直接对接 Chrome DevTools Pr
4949

5050
## 最新功能
5151

52+
### 通过 WebSocket 进行远程连接 —— 随时随地控制浏览器!
53+
54+
现在你可以使用浏览器的 WebSocket 地址直接连接到已运行的实例,并立即使用完整的 Pydoll API:
55+
56+
```python
57+
from pydoll.browser.chromium import Chrome
58+
59+
chrome = Chrome()
60+
tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
61+
62+
# 直接开干:导航、元素自动化、请求、事件…
63+
await tab.go_to('https://example.com')
64+
title = await tab.execute_script('return document.title')
65+
print(title)
66+
```
67+
68+
这让你可以轻松对接远程/CI 浏览器、容器或共享调试目标——无需本地启动,只需指向 WS 端点即可自动化。
69+
70+
### 像专业人士一样漫游 DOM:get_children_elements() 与 get_siblings_elements()
71+
72+
两个让复杂布局遍历更优雅的小助手:
73+
74+
```python
75+
# 获取容器的直接子元素
76+
container = await tab.find(id='cards')
77+
cards = await container.get_children_elements(max_depth=1)
78+
79+
# 想更深入?这将返回子元素的子元素(以此类推)
80+
elements = await container.get_children_elements(max_depth=2)
81+
82+
# 在横向列表中无痛遍历兄弟元素
83+
active = await tab.find(class_name='item--active')
84+
siblings = await active.get_siblings_elements()
85+
86+
print(len(cards), len(siblings))
87+
```
88+
89+
用更少样板代码表达更多意图,特别适合动态网格、列表与菜单的场景,让抓取/自动化逻辑更清晰、更可读。
90+
5291
### WebElement:状态等待与新的公共 API
5392

5493
- 新增 `wait_until(...)` 用于等待元素状态,使用更简单:
@@ -212,7 +251,7 @@ options.browser_preferences = {
212251

213252
这种控制级别以前只有 Chrome 扩展开发者才能使用 - 现在它在你的自动化工具包中!
214253

215-
查看[文档](https://autoscrape-labs.github.io/pydoll/features/custom-browser-preferences/)了解更多详情。
254+
查看[文档](https://pydoll.tech/docs/zh/features/#custom-browser-preferences/)了解更多详情。
216255

217256
### 新的 `get_parent_element()` 方法
218257
检索任何 WebElement 的父元素,使导航 DOM 结构更加容易:

cz.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
commitizen:
33
name: cz_conventional_commits
44
tag_format: $version
5-
version: 2.7.0
5+
version: 2.8.1

public/docs/features.md

Lines changed: 86 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,53 @@ Capture visual content from web pages:
5757
- **High-Quality PDF Export**: Generate PDF documents from web pages
5858
- **Custom Formatting**: Coming soon!
5959

60+
## Remote Connections and Hybrid Automation
61+
62+
### Connect to a running browser via WebSocket
63+
64+
Control an already running browser remotely by pointing Pydoll to its DevTools WebSocket address.
65+
66+
```python
67+
import asyncio
68+
from pydoll.browser.chromium import Chrome
69+
70+
async def main():
71+
chrome = Chrome()
72+
tab = await chrome.connect('ws://YOUR_HOST:9222/devtools/browser/XXXX')
73+
74+
await tab.go_to('https://example.com')
75+
title = await tab.execute_script('return document.title')
76+
print(title)
77+
78+
asyncio.run(main())
79+
```
80+
81+
Perfect for CI, containers, remote hosts, or shared debugging targets—no local launch required. Just provide the WS endpoint and automate.
82+
83+
### Bring your own CDP: wrap existing sessions with Pydoll objects
84+
85+
If you already have your own CDP integration, you can still leverage Pydoll’s high-level API by wiring it to an existing DevTools session. As long as you know an element’s `objectId`, you can create a `WebElement` directly:
86+
87+
```python
88+
from pydoll.connection import ConnectionHandler
89+
from pydoll.elements.web_element import WebElement
90+
91+
# Your DevTools WebSocket endpoint and an element objectId you resolved via CDP
92+
ws = 'ws://YOUR_HOST:9222/devtools/page/ABCDEF...'
93+
object_id = 'REMOTE_ELEMENT_OBJECT_ID'
94+
95+
connection_handler = ConnectionHandler(ws_address=ws)
96+
element = WebElement(object_id=object_id, connection_handler=connection_handler)
97+
98+
# Use the full WebElement API immediately
99+
visible = await element.is_visible()
100+
await element.wait_until(is_interactable=True, timeout=10)
101+
await element.click()
102+
text = await element.text
103+
```
104+
105+
This hybrid approach lets you blend your low-level CDP tooling (for discovery, instrumentation, or custom flows) with Pydoll’s ergonomic element API.
106+
60107
## Intuitive Element Finding
61108

62109
Pydoll v2.0+ introduces a revolutionary approach to finding elements that's both more intuitive and more powerful than traditional selector-based methods.
@@ -137,6 +184,44 @@ async def query_examples():
137184
asyncio.run(query_examples())
138185
```
139186

187+
### DOM Traversal Helpers: get_children_elements() and get_siblings_elements()
188+
189+
These helpers let you traverse the DOM tree from a known anchor, preserving scope and intent.
190+
191+
- get_children_elements(max_depth: int = 1, tag_filter: list[str] | None = None, raise_exc: bool = False) -> list[WebElement]
192+
- Returns descendants up to max_depth using pre-order traversal (direct children first, then their descendants)
193+
- max_depth=1 returns only direct children; 2 includes grandchildren, and so on
194+
- tag_filter restricts results to specific tags (use lowercase names, e.g. ['a', 'li'])
195+
- raise_exc=True raises ElementNotFound if the underlying script fails to resolve
196+
197+
- get_siblings_elements(tag_filter: list[str] | None = None, raise_exc: bool = False) -> list[WebElement]
198+
- Returns elements sharing the same parent, excluding the current element
199+
- tag_filter narrows by tag; order follows the parent’s child order
200+
201+
```python
202+
# Direct children in document order
203+
container = await tab.find(id='cards')
204+
children = await container.get_children_elements(max_depth=1)
205+
206+
# Include grandchildren
207+
descendants = await container.get_children_elements(max_depth=2)
208+
209+
# Filter by tag
210+
links = await container.get_children_elements(max_depth=4, tag_filter=['a'])
211+
212+
# Horizontal traversal
213+
active = await tab.find(class_name='item-active')
214+
siblings = await active.get_siblings_elements()
215+
link_siblings = await active.get_siblings_elements(tag_filter=['a'])
216+
```
217+
218+
Performance and correctness notes:
219+
220+
- DOM is a tree: breadth expands quickly with depth. Prefer small max_depth values and apply tag_filter to minimize work.
221+
- Ordering: children follow document order; siblings follow the parent’s order for stable iteration.
222+
- iFrames: each iframe has its own tree. Use `tab.get_frame(iframe_element)` to traverse inside the frame, then call these helpers there.
223+
- Large documents: deep traversals can touch many nodes. Combine shallow traversal with targeted `find()`/`query()` on subtree anchors for best performance.
224+
140225
## Native Cloudflare Captcha Bypass
141226

142227
!!! warning "Important Information About Captcha Bypass"
@@ -440,12 +525,8 @@ async def main():
440525
async with Chrome() as browser:
441526
# Start the browser once
442527
await browser.start()
443-
444-
# Create partial function with browser parameter
445-
scrape_with_browser = partial(scrape_page, browser)
446-
447528
# Process all URLs concurrently using the same browser
448-
results = await asyncio.gather(*(scrape_with_browser(url) for url in urls))
529+
results = await asyncio.gather(*(scrape_page(browser, url) for url in urls))
449530

450531
# Print results
451532
for result in results:

0 commit comments

Comments
 (0)