Skip to content

Commit 4612444

Browse files
authored
Merge pull request #152 from autoscrape-labs/thalissonvs/issue150
Implement singleton pattern for Tab instances and add tab retrieval method
2 parents f9f23cc + c5f729a commit 4612444

File tree

6 files changed

+894
-64
lines changed

6 files changed

+894
-64
lines changed

docs/features.md

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,133 @@ asyncio.run(background_bypass_example())
209209

210210
Access websites that actively block automation tools without using third-party captcha solving services. This native captcha handling makes Pydoll suitable for automating previously inaccessible websites.
211211

212+
## Multi-Tab Management
213+
214+
Pydoll provides sophisticated tab management capabilities with a singleton pattern that ensures efficient resource usage and prevents duplicate Tab instances for the same browser tab.
215+
216+
### Tab Singleton Pattern
217+
218+
Pydoll implements a singleton pattern for Tab instances based on the browser's target ID. This means:
219+
220+
- **One Tab instance per browser tab**: Multiple references to the same browser tab return the same Tab object
221+
- **Automatic resource management**: No duplicate connections or handlers for the same tab
222+
- **Consistent state**: All references to a tab share the same state and event handlers
223+
224+
```python
225+
import asyncio
226+
from pydoll.browser.chromium import Chrome
227+
from pydoll.browser.tab import Tab
228+
229+
async def singleton_demonstration():
230+
async with Chrome() as browser:
231+
tab = await browser.start()
232+
233+
# Get the same tab through different methods - they're identical objects
234+
same_tab = Tab(browser, browser._connection_port, tab._target_id)
235+
opened_tabs = await browser.get_opened_tabs()
236+
237+
# All references point to the same singleton instance
238+
print(f"Same object? {tab is same_tab}") # May be True if same target_id
239+
print(f"Tab instances are managed as singletons")
240+
241+
asyncio.run(singleton_demonstration())
242+
```
243+
244+
### Creating New Tabs Programmatically
245+
246+
Use `new_tab()` to create tabs programmatically with full control:
247+
248+
```python
249+
import asyncio
250+
from pydoll.browser.chromium import Chrome
251+
252+
async def programmatic_tab_creation():
253+
async with Chrome() as browser:
254+
# Start with the initial tab
255+
main_tab = await browser.start()
256+
257+
# Create additional tabs with specific URLs
258+
search_tab = await browser.new_tab('https://google.com')
259+
news_tab = await browser.new_tab('https://news.ycombinator.com')
260+
docs_tab = await browser.new_tab('https://docs.python.org')
261+
262+
# Work with multiple tabs simultaneously
263+
await search_tab.find(name='q').type_text('Python automation')
264+
await news_tab.find(class_name='storylink', find_all=True)
265+
await docs_tab.find(id='search-field').type_text('asyncio')
266+
267+
# Get all opened tabs
268+
all_tabs = await browser.get_opened_tabs()
269+
print(f"Total tabs open: {len(all_tabs)}")
270+
271+
# Close specific tabs when done
272+
await search_tab.close()
273+
await news_tab.close()
274+
275+
asyncio.run(programmatic_tab_creation())
276+
```
277+
278+
### Handling User-Opened Tabs
279+
280+
When users click links that open new tabs (target="_blank"), use `get_opened_tabs()` to detect and manage them:
281+
282+
```python
283+
import asyncio
284+
from pydoll.browser.chromium import Chrome
285+
286+
async def handle_user_opened_tabs():
287+
async with Chrome() as browser:
288+
main_tab = await browser.start()
289+
await main_tab.go_to('https://example.com')
290+
291+
# Get initial tab count
292+
initial_tabs = await browser.get_opened_tabs()
293+
initial_count = len(initial_tabs)
294+
print(f"Initial tabs: {initial_count}")
295+
296+
# Click a link that opens a new tab (target="_blank")
297+
external_link = await main_tab.find(text='Open in New Tab')
298+
await external_link.click()
299+
300+
# Wait for new tab to open
301+
await asyncio.sleep(2)
302+
303+
# Detect new tabs
304+
current_tabs = await browser.get_opened_tabs()
305+
new_tab_count = len(current_tabs)
306+
307+
if new_tab_count > initial_count:
308+
print(f"New tab detected! Total tabs: {new_tab_count}")
309+
310+
# Get the newly opened tab (last in the list)
311+
new_tab = current_tabs[-1]
312+
313+
# Work with the new tab
314+
await new_tab.go_to('https://different-site.com')
315+
title = await new_tab.execute_script('return document.title')
316+
print(f"New tab title: {title}")
317+
318+
# Close the new tab when done
319+
await new_tab.close()
320+
321+
asyncio.run(handle_user_opened_tabs())
322+
```
323+
324+
### Key Benefits of Pydoll's Tab Management
325+
326+
1. **Singleton Pattern**: Prevents resource duplication and ensures consistent state
327+
2. **Automatic Detection**: `get_opened_tabs()` finds all tabs, including user-opened ones
328+
3. **Concurrent Processing**: Handle multiple tabs simultaneously with asyncio
329+
4. **Resource Management**: Proper cleanup prevents memory leaks
330+
5. **Event Isolation**: Each tab maintains its own event handlers and state
331+
332+
This sophisticated tab management makes Pydoll ideal for:
333+
- **Multi-page workflows** that require coordination between tabs
334+
- **Parallel data extraction** from multiple sources
335+
- **Testing applications** that use popup windows or new tabs
336+
- **Monitoring user behavior** across multiple browser tabs
337+
338+
212339
## Concurrent Scraping
213340

214341
Pydoll's async architecture allows you to scrape multiple pages or websites simultaneously for maximum efficiency:

pydoll/browser/chromium/base.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,10 +215,30 @@ async def get_targets(self) -> list[TargetInfo]:
215215
216216
Targets include pages, service workers, shared workers, and browser process.
217217
Useful for debugging and managing multiple tabs.
218+
219+
Returns:
220+
List of TargetInfo objects.
218221
"""
219222
response: GetTargetsResponse = await self._execute_command(TargetCommands.get_targets())
220223
return response['result']['targetInfos']
221224

225+
async def get_opened_tabs(self) -> list[Tab]:
226+
"""
227+
Get all opened tabs that are not extensions and have the type 'page'
228+
229+
Returns:
230+
List of Tab instances. The last tab is the most recent one.
231+
"""
232+
targets = await self.get_targets()
233+
valid_tab_targets = [
234+
target for target in targets if target['type'] == 'page'
235+
and 'extension' not in target['url']
236+
]
237+
return [
238+
Tab(self, self._connection_port, target['targetId']) for target
239+
in reversed(valid_tab_targets)
240+
]
241+
222242
async def set_download_path(self, path: str, browser_context_id: Optional[str] = None):
223243
"""Set download directory path (convenience method for set_download_behavior)."""
224244
return await self._execute_command(

pydoll/browser/tab.py

Lines changed: 88 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,43 @@ class Tab(FindElementsMixin): # noqa: PLR0904
6868
Primary interface for web page automation including navigation, DOM manipulation,
6969
JavaScript execution, event handling, network monitoring, and specialized tasks
7070
like Cloudflare bypass.
71+
72+
This class implements a singleton pattern based on target_id to ensure
73+
only one Tab instance exists per browser tab.
7174
"""
7275

76+
_instances: dict[str, 'Tab'] = {}
77+
78+
def __new__(
79+
cls,
80+
browser: 'Browser',
81+
connection_port: int,
82+
target_id: str,
83+
browser_context_id: Optional[str] = None,
84+
) -> 'Tab':
85+
"""
86+
Create or return existing Tab instance for the given target_id.
87+
88+
Args:
89+
browser: Browser instance that created this tab.
90+
connection_port: CDP WebSocket port.
91+
target_id: CDP target identifier for this tab.
92+
browser_context_id: Optional browser context ID.
93+
94+
Returns:
95+
Tab instance (new or existing) for the target_id.
96+
"""
97+
if target_id in cls._instances:
98+
existing_instance = cls._instances[target_id]
99+
existing_instance._browser = browser
100+
existing_instance._connection_port = connection_port
101+
existing_instance._browser_context_id = browser_context_id
102+
return existing_instance
103+
104+
instance = super().__new__(cls)
105+
cls._instances[target_id] = instance
106+
return instance
107+
73108
def __init__(
74109
self,
75110
browser: 'Browser',
@@ -86,18 +121,57 @@ def __init__(
86121
target_id: CDP target identifier for this tab.
87122
browser_context_id: Optional browser context ID.
88123
"""
89-
self._browser = browser
90-
self._connection_port = connection_port
91-
self._target_id = target_id
92-
self._connection_handler = ConnectionHandler(connection_port, self._target_id)
93-
self._page_events_enabled = False
94-
self._network_events_enabled = False
95-
self._fetch_events_enabled = False
96-
self._dom_events_enabled = False
97-
self._runtime_events_enabled = False
98-
self._intercept_file_chooser_dialog_enabled = False
124+
if hasattr(self, '_initialized') and self._initialized:
125+
return
126+
127+
self._browser: 'Browser' = browser
128+
self._connection_port: int = connection_port
129+
self._target_id: str = target_id
130+
self._connection_handler: ConnectionHandler = ConnectionHandler(
131+
connection_port, self._target_id
132+
)
133+
self._page_events_enabled: bool = False
134+
self._network_events_enabled: bool = False
135+
self._fetch_events_enabled: bool = False
136+
self._dom_events_enabled: bool = False
137+
self._runtime_events_enabled: bool = False
138+
self._intercept_file_chooser_dialog_enabled: bool = False
99139
self._cloudflare_captcha_callback_id: Optional[int] = None
100-
self._browser_context_id = browser_context_id
140+
self._browser_context_id: Optional[str] = browser_context_id
141+
self._initialized: bool = True
142+
143+
@classmethod
144+
def _remove_instance(cls, target_id: str) -> None:
145+
"""
146+
Remove instance from registry when tab is closed.
147+
148+
Args:
149+
target_id: Target ID to remove from registry.
150+
"""
151+
cls._instances.pop(target_id, None)
152+
153+
@classmethod
154+
def get_instance(cls, target_id: str) -> Optional['Tab']:
155+
"""
156+
Get existing Tab instance for target_id if it exists.
157+
158+
Args:
159+
target_id: Target ID to look up.
160+
161+
Returns:
162+
Existing Tab instance or None if not found.
163+
"""
164+
return cls._instances.get(target_id)
165+
166+
@classmethod
167+
def get_all_instances(cls) -> dict[str, 'Tab']:
168+
"""
169+
Get all active Tab instances.
170+
171+
Returns:
172+
Dictionary mapping target_id to Tab instances.
173+
"""
174+
return cls._instances.copy()
101175

102176
@property
103177
def page_events_enabled(self) -> bool:
@@ -283,7 +357,9 @@ async def close(self):
283357
Note:
284358
Tab instance becomes invalid after calling this method.
285359
"""
286-
return await self._execute_command(PageCommands.close())
360+
result = await self._execute_command(PageCommands.close())
361+
self._remove_instance(self._target_id)
362+
return result
287363

288364
async def get_frame(self, frame: WebElement) -> IFrame:
289365
"""

0 commit comments

Comments
 (0)