Skip to content

Commit 4066c4d

Browse files
chrisschnablclaude
andcommitted
fix(browser-use): Enhanced navigation strategy, element detection, and task validation
- Added smart navigation patterns to system prompt for better data discovery and content location - Added dynamic content recognition guidelines for infinite scroll, AJAX loading, and empty grids - Implemented new wait_for_dynamic_content action to handle dynamically loaded content - Enhanced search strategy with alternative navigation paths and premium content detection - Optimized file system usage to reduce unnecessary operations for simple data extraction tasks - Added guidance on recognizing loading states, pagination loops, and content container detection These fixes target the main failure patterns identified: - Step limit exhaustion due to inefficient navigation (EPA AQI, BBC recipes) - Incorrect results from poor element detection (Zara products, Fox Sports videos) - Over-engineered workflows for simple tasks (PlayStation store lookup) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent 122feb9 commit 4066c4d

File tree

3 files changed

+105
-0
lines changed

3 files changed

+105
-0
lines changed

browser_use/agent/system_prompt.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ Strictly follow these rules while using the browser and navigating the web:
8181
- If a captcha appears, attempt solving it if possible. If not, use fallback strategies (e.g., alternative site, backtrack).
8282
- If expected elements are missing, try refreshing, scrolling, or navigating back.
8383
- If the page is not fully loaded, use the wait action.
84+
- Use `wait_for_dynamic_content` when you encounter empty product grids, video lists, or content areas that should contain data but appear empty - this can trigger loading of dynamically loaded content.
8485
- You can call extract_structured_data on specific pages to gather structured semantic information from the entire page, including parts not currently visible.
8586
- Call extract_structured_data only if the information you are looking for is not visible in your <browser_state> otherwise always just use the needed text from the <browser_state>.
8687
- Calling the extract_structured_data tool is expensive! DO NOT query the same page with the same extract_structured_data query multiple times. Make sure that you are on the page with relevant information based on the screenshot before calling this tool.
@@ -96,6 +97,25 @@ Strictly follow these rules while using the browser and navigating the web:
9697
2. Open ended tasks. Plan yourself, be creative in achieving them.
9798
- If you get stuck e.g. with logins or captcha in open-ended tasks you can re-evaluate the task and try alternative ways, e.g. sometimes accidentally login pops up, even though there some part of the page is accessible or you get some information via web search.
9899
- If you reach a PDF viewer, the file is automatically downloaded and you can see its path in <available_file_paths>. You can either read the file or scroll in the page to see more.
100+
101+
**Smart Navigation Patterns:**
102+
- For data lookup tasks (e.g., AQI, statistics), look for direct data portals, maps, or "Data" menu links instead of generic search
103+
- For product/content searches, navigate to category-specific sections (e.g., "New Arrivals", "NBA Videos") rather than site-wide search
104+
- For recipes/content, check if authentication or premium access is required if search returns no results
105+
- When searching yields no results, try alternative navigation paths: menu categories, filter selections, or direct URL patterns
106+
- Recognize loading states: "Loading...", spinners, empty grids that may populate, infinite scroll indicators
107+
- If content appears empty, wait 2-3 seconds and scroll slightly to trigger dynamic loading before concluding no content exists
108+
- For e-commerce/catalog sites, look for product grids, category filters, and sorting options rather than relying solely on search
109+
- When stuck in pagination loops, try category navigation or filters instead of continuing to paginate through search results
110+
- If a task requires specific data that should exist, try multiple navigation approaches: direct menu links, category browsing, filtered searches
111+
- Always verify you're on the correct content-displaying page before concluding data doesn't exist (e.g., data tables, product grids, video lists)
112+
113+
**Dynamic Content Recognition:**
114+
- Before scrolling extensively, check if page has infinite scroll by scrolling once and waiting to see if content loads
115+
- Look for "Load More", "Show More", or pagination controls that might reveal additional content
116+
- If product grids or content lists appear empty, try interacting with category filters, sorting options, or view toggles
117+
- For video/media sites, check if content is behind category tabs, genre filters, or requires interaction to load
118+
- Recognize when you're viewing category/navigation pages vs. actual content pages - navigate deeper if needed
99119
</browser_rules>
100120

101121
<file_system>
@@ -106,6 +126,8 @@ Strictly follow these rules while using the browser and navigating the web:
106126
- If exists, <available_file_paths> includes files you have downloaded or uploaded by the user. You can only read or upload these files but you don't have write access.
107127
- If the task is really long, initialize a `results.md` file to accumulate your results.
108128
- DO NOT use the file system if the task is less than 10 steps!
129+
- For simple data extraction tasks (e.g., getting product prices, release dates, single pieces of information), output results directly in the `done` action rather than creating files
130+
- Only save extracted content to files for complex tasks with multiple data points or when specifically requested by the user
109131
</file_system>
110132

111133
<task_completion_rules>

browser_use/tools/service.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
StructuredOutputAction,
4949
SwitchTabAction,
5050
UploadFileAction,
51+
WaitForDynamicContentAction,
5152
)
5253
from browser_use.utils import _log_pretty_url, time_execution_sync
5354

@@ -761,6 +762,82 @@ async def scroll_to_text(text: str, browser_session: BrowserSession): # type: i
761762
long_term_memory=f"Tried scrolling to text '{text}' but it was not found",
762763
)
763764

765+
@self.registry.action(
766+
'Wait for dynamic content to load on the current page. Use when content appears empty or is loading. Optionally scroll slightly to trigger loading.',
767+
param_model=WaitForDynamicContentAction,
768+
)
769+
async def wait_for_dynamic_content(params: WaitForDynamicContentAction, browser_session: BrowserSession):
770+
import asyncio
771+
772+
# Get initial page state
773+
try:
774+
initial_state = await browser_session.get_browser_state_summary(include_screenshot=False)
775+
initial_elements = len(initial_state.clickable_elements)
776+
777+
# Optionally trigger loading with a small scroll
778+
if params.scroll_trigger:
779+
try:
780+
# Small scroll down and then back up to trigger loading
781+
scroll_event = browser_session.event_bus.dispatch(
782+
ScrollEvent(pages=0.1, down=True, node=None)
783+
)
784+
await scroll_event
785+
await scroll_event.event_result(raise_if_any=False, raise_if_none=False)
786+
787+
# Wait a moment
788+
await asyncio.sleep(1)
789+
790+
# Scroll back up
791+
scroll_event = browser_session.event_bus.dispatch(
792+
ScrollEvent(pages=0.1, down=False, node=None)
793+
)
794+
await scroll_event
795+
await scroll_event.event_result(raise_if_any=False, raise_if_none=False)
796+
except Exception:
797+
pass # Ignore scroll errors, just continue with waiting
798+
799+
# Wait for the specified time, checking periodically for new content
800+
wait_time = params.timeout_seconds
801+
check_interval = min(1, wait_time / 3) # Check 3 times during wait period
802+
803+
for i in range(int(wait_time / check_interval)):
804+
await asyncio.sleep(check_interval)
805+
806+
# Check if new elements appeared
807+
current_state = await browser_session.get_browser_state_summary(include_screenshot=False)
808+
current_elements = len(current_state.clickable_elements)
809+
810+
# If looking for specific pattern, check for it
811+
if params.element_pattern:
812+
page_text = ' '.join([elem.text for elem in current_state.clickable_elements if elem.text])
813+
if params.element_pattern.lower() in page_text.lower():
814+
memory = f'Found pattern "{params.element_pattern}" after {i * check_interval:.1f}s'
815+
logger.info(f'⏳ {memory}')
816+
return ActionResult(extracted_content=memory, long_term_memory=memory)
817+
818+
# Check if significant new content appeared
819+
if current_elements > initial_elements + 3: # More than 3 new elements
820+
memory = f'New content loaded: {current_elements - initial_elements} new elements after {i * check_interval:.1f}s'
821+
logger.info(f'⏳ {memory}')
822+
return ActionResult(extracted_content=memory, long_term_memory=memory)
823+
824+
# Final wait period completed
825+
final_state = await browser_session.get_browser_state_summary(include_screenshot=False)
826+
final_elements = len(final_state.clickable_elements)
827+
828+
if final_elements > initial_elements:
829+
memory = f'Waited {wait_time}s for dynamic content - {final_elements - initial_elements} new elements appeared'
830+
else:
831+
memory = f'Waited {wait_time}s for dynamic content - no significant changes detected'
832+
833+
logger.info(f'⏳ {memory}')
834+
return ActionResult(extracted_content=memory, long_term_memory=memory)
835+
836+
except Exception as e:
837+
error_msg = f'Failed to wait for dynamic content: {str(e)}'
838+
logger.error(error_msg)
839+
return ActionResult(error=error_msg)
840+
764841
# Dropdown Actions
765842

766843
@self.registry.action(

browser_use/tools/views.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,9 @@ class GetDropdownOptionsAction(BaseModel):
9191
class SelectDropdownOptionAction(BaseModel):
9292
index: int = Field(ge=1, description='index of the dropdown element to select an option for')
9393
text: str = Field(description='the text or exact value of the option to select')
94+
95+
96+
class WaitForDynamicContentAction(BaseModel):
97+
timeout_seconds: int = Field(default=5, ge=1, le=10, description='seconds to wait for dynamic content to load')
98+
scroll_trigger: bool = Field(default=True, description='whether to scroll slightly to trigger content loading')
99+
element_pattern: str | None = Field(default=None, description='optional text pattern to wait for in elements')

0 commit comments

Comments
 (0)