-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Feat/video issue solved #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Langgraph structure change
…d dealt with max iterations
…ations Enhanced Prompt enahncer agent to enahnce the prompts acciordingly an…
…ations Changed the bumber of max steps
…ing displayed on the frontend
…r completelty stop when the user ask about testing the UI/animation
Enhancing intent classifier
…animation testing
Modified the prompt enhancer agents promp to not to entertain the ui/…
Integrated ollama locally to run llm locally
Local gemma model in ollama integratuion
Generation of video locally
…ated to UI/animation and respective changes int he agent orchestration
…thin langgraph and qa possibility prompt enhancer agent
…nd make changes.txt file
…e ui and debugged the intent classifier function
…n, debugged screeshot noe being taken
…lity within the QA possibility agent
areeba-latif seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cubic analysis
40 issues found across 81 files • Review in cubic
React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai
to give feedback, ask questions, or re-run the review.
@@ -0,0 +1,46 @@ | |||
from fastapi import WebSocket, logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fastapi does not export logger, so this import will raise an ImportError at runtime; the correct path is fastapi.logger.logger
Prompt for AI agents
Address the following comment on src/websocket/websocket_manager.py at line 1:
<comment>fastapi does not export logger, so this import will raise an ImportError at runtime; the correct path is fastapi.logger.logger</comment>
<file context>
@@ -0,0 +1,46 @@
+from fastapi import WebSocket, logger
+from typing import List
+import asyncio
</file context>
from fastapi import WebSocket, logger | |
from fastapi.logger import logger |
|
||
# Get page content and viewport size | ||
content = await self.page.content() | ||
viewport_size = await self.page.viewport_size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Page.viewport_size() is a synchronous Playwright call; awaiting it will raise “TypeError: object PageViewportSizeResult can’t be used in ‘await’ expression”.
Prompt for AI agents
Address the following comment on src/browser_use/browser/dolphin_service.py at line 330:
<comment>Page.viewport_size() is a synchronous Playwright call; awaiting it will raise “TypeError: object PageViewportSizeResult can’t be used in ‘await’ expression”.</comment>
<file context>
@@ -0,0 +1,348 @@
+import logging
+import os
+
+import aiohttp
+from playwright.async_api import Page, async_playwright
+
+from browser_use.browser.service import Browser
+from browser_use.browser.views import BrowserState, TabInfo
+
</file context>
fonts-dejavu-core \ | ||
fonts-dejavu-extra \ | ||
vim \ | ||
# Video recording dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the previous line ("vim ") ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading "#" therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.
Prompt for AI agents
Address the following comment on Dockerfile at line 47:
<comment>Because the previous line ("vim \") ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading "#" therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.</comment>
<file context>
@@ -44,6 +44,16 @@ RUN apt-get update && apt-get install -y \
fonts-dejavu-core \
fonts-dejavu-extra \
vim \
+ # Video recording dependencies
+ ffmpeg \
+ libavcodec-extra \
</file context>
chat_message = { | ||
"role": "assistant", | ||
"content": final_content.strip(), # Remove leading/trailing whitespace | ||
async def run_agent_task(query: str, url: str, message_callback: Optional[Callable[[str], Awaitable[None]]] = None) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new run_agent_task signature no longer matches the existing call sites (e.g. handle_submit still calls run_agent_task(webui_manager, components)). At runtime this will raise TypeError: run_agent_task() takes 2 positional arguments but 3 were given
, breaking the tab completely.
Prompt for AI agents
Address the following comment on src/webui/components/browser_use_agent_tab.py at line 355:
<comment>The new run_agent_task signature no longer matches the existing call sites (e.g. handle_submit still calls run_agent_task(webui_manager, components)). At runtime this will raise `TypeError: run_agent_task() takes 2 positional arguments but 3 were given`, breaking the tab completely.</comment>
<file context>
@@ -65,732 +89,893 @@ async def _initialize_llm(
)
return None
-
-def _get_config_value(
- webui_manager: WebuiManager,
- comp_dict: Dict[gr.components.Component, Any],
- comp_id_suffix: str,
- default: Any = None,
</file context>
<span style="float:right; color:#28a745;">● Connected</span> | ||
</div> | ||
<iframe | ||
src="http://localhost:6080/vnc.html?autoconnect=true&resize=scale&password=youvncpassword&autoconnect=true&resize=scale&quality=6&compression=6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded credential ("youvncpassword") is embedded directly in the HTML iframe URL. Exposing passwords in source code is a security risk and violates secret-management best practices.
Prompt for AI agents
Address the following comment on src/webui/components/browser_use_agent_tab.py at line 1230:
<comment>Hard-coded credential ("youvncpassword") is embedded directly in the HTML iframe URL. Exposing passwords in source code is a security risk and violates secret-management best practices.</comment>
<file context>
@@ -996,23 +1197,48 @@ def create_browser_use_agent_tab(webui_manager: WebuiManager):
interactive=True,
elem_id="user_input",
)
+ url_input = gr.Textbox(
+ label="Enter URL",
+ placeholder="Enter the URL to analyze (optional)",
+ lines=1,
+ interactive=True,
+ elem_id="url_input",
</file context>
y: int | ||
|
||
|
||
class DragDropAction(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model allows creating a drag-drop action without specifying either element selectors or absolute coordinates, which would make the action impossible to execute. Add a validator to ensure that at least one pair of source/target references (selectors or coordinates) is provided.
Prompt for AI agents
Address the following comment on src/browser_use/controller/views.py at line 72:
<comment>The model allows creating a drag-drop action without specifying either element selectors or absolute coordinates, which would make the action impossible to execute. Add a validator to ensure that at least one pair of source/target references (selectors or coordinates) is provided.</comment>
<file context>
@@ -0,0 +1,91 @@
+from pydantic import BaseModel, ConfigDict, Field, model_validator
+
+
+# Action Input Models
+class SearchGoogleAction(BaseModel):
+ query: str
+
+
+class GoToUrlAction(BaseModel):
</file context>
```json | ||
{ | ||
"agent_msg": "QA is possible on the extracted Snippet", | ||
"qa_possibilty": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field name "qa_possibilty" is misspelled; the real schema uses "qa_possibility". Keeping the typo in examples will mislead developers and break JSON validation.
Prompt for AI agents
Address the following comment on src/agent/qa_possibilty_checker/README.md at line 22:
<comment>Field name "qa_possibilty" is misspelled; the real schema uses "qa_possibility". Keeping the typo in examples will mislead developers and break JSON validation.</comment>
<file context>
@@ -0,0 +1,143 @@
+The **QA Possibility Checker Agent** is like a smart gatekeeper in our AI-based QA automation system. Its job is to check whether the test the user wants to perform can actually be done on the current webpage snippet.
+
+
+## What This Agent Does
+
+* When a user says something like:
+ **"Check if the 'Submit' button works"**
+
+ And the webpage snippet actually includes a `<button>Submit</button>` –
</file context>
"qa_possibilty": true | |
"qa_possibility": true |
This file defines a simple Pydantic class that sets the structure for the agent's response: | ||
|
||
```python | ||
class QAPossibiltyChecker(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class name is misspelled ("QAPossibiltyChecker") and does not match the real implementation ("QAPossibilityCheckerOutput") inside src/agent/qa_possibilty_checker/output.py; this mismatch will confuse readers and lead to incorrect usage.
Prompt for AI agents
Address the following comment on src/agent/qa_possibilty_checker/README.md at line 93:
<comment>Class name is misspelled ("QAPossibiltyChecker") and does not match the real implementation ("QAPossibilityCheckerOutput") inside src/agent/qa_possibilty_checker/output.py; this mismatch will confuse readers and lead to incorrect usage.</comment>
<file context>
@@ -0,0 +1,143 @@
+The **QA Possibility Checker Agent** is like a smart gatekeeper in our AI-based QA automation system. Its job is to check whether the test the user wants to perform can actually be done on the current webpage snippet.
+
+
+## What This Agent Does
+
+* When a user says something like:
+ **"Check if the 'Submit' button works"**
+
+ And the webpage snippet actually includes a `<button>Submit</button>` –
</file context>
class QAPossibiltyChecker(BaseModel): | |
class QAPossibilityCheckerOutput(BaseModel): |
# glob patterns are very easy to mess up and match too many domains by accident | ||
# e.g. if you only need to access gmail, don't use *.google.com because an attacker could convince the agent to visit a malicious doc | ||
# on docs.google.com/s/some/evil/doc to set up a prompt injection attack | ||
"⚠️ Allowing agent to visit {domain} based on allowed_domains=['{glob}', ...]. Set allowed_domains=['{domain}', ...] explicitly to avoid the security risks of glob patterns!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message uses string formatting placeholders but is not declared as an f-string, so the variables domain
and glob
will not be interpolated and the warning will always show the literal braces. Use an f-string or .format()
to include the actual values.
Prompt for AI agents
Address the following comment on src/browser_use/browser/context.py at line 963:
<comment>The log message uses string formatting placeholders but is not declared as an f-string, so the variables `domain` and `glob` will not be interpolated and the warning will always show the literal braces. Use an f-string or `.format()` to include the actual values.</comment>
<file context>
@@ -0,0 +1,2027 @@
+"""
+Playwright browser on steroids.
+"""
+
+import asyncio
+import base64
+import gc
+import json
+import logging
</file context>
|
||
class CustomBrowser(Browser): | ||
|
||
def __init__(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor is overridden but adds no new behavior and accepts arbitrary *args/**kwargs, hiding the explicit config: BrowserConfig | None = None
signature defined in the base class. This weakens static-type checking and makes the override redundant, reducing maintainability.
Prompt for AI agents
Address the following comment on src/browser/custom_browser.py at line 36:
<comment>The constructor is overridden but adds no new behavior and accepts arbitrary *args/**kwargs, hiding the explicit `config: BrowserConfig | None = None` signature defined in the base class. This weakens static-type checking and makes the override redundant, reducing maintainability.</comment>
<file context>
@@ -33,6 +33,9 @@
class CustomBrowser(Browser):
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
</file context>
…oving the unused stop api
Summary by cubic
Added a live browser testing workflow with real-time frontend viewing, new API endpoints, and agent orchestration for SQA automation. Integrated video recording, local LLM support (Ollama, Gemma), enhanced agent logic for prompt and QA possibility checking, and improved Docker deployment.
New Features
Live browser automation view in the frontend with instant feedback.
FastAPI backend with WebSocket messaging and new endpoints.
Output data and videos saved locally and exposed to UI.
Enhanced agents for intent classification, prompt improvement, and QA possibility.
Support for Ollama and Gemma local models.
Simple HTML and Gradio UIs provided.
Migration
Update Docker image and use
uvicorn src.API.main:app
for web API.Mount
src/outputdata
volume to access agent data and recordings.Start with
python start_live_browser.py
or Docker Compose for full workflow.