Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Aug 15, 2025

Summary by cubic

Added a live browser testing workflow with real-time frontend viewing, new API endpoints, and agent orchestration for SQA automation. Integrated video recording, local LLM support (Ollama, Gemma), enhanced agent logic for prompt and QA possibility checking, and improved Docker deployment.

  • New Features

  • Live browser automation view in the frontend with instant feedback.

  • FastAPI backend with WebSocket messaging and new endpoints.

  • Output data and videos saved locally and exposed to UI.

  • Enhanced agents for intent classification, prompt improvement, and QA possibility.

  • Support for Ollama and Gemma local models.

  • Simple HTML and Gradio UIs provided.

  • Migration

  • Update Docker image and use uvicorn src.API.main:app for web API.

  • Mount src/outputdata volume to access agent data and recordings.

  • Start with python start_live_browser.py or Docker Compose for full workflow.

MohsinAliIrfan and others added 30 commits June 10, 2025 20:22
…ations

Enhanced Prompt enahncer agent to enahnce the prompts acciordingly an…
…r completelty stop when the user ask about testing the UI/animation
Modified the prompt enhancer agents promp to not to entertain the ui/…
Integrated ollama locally to run llm locally
Local gemma model in ollama integratuion
…ated to UI/animation and respective changes int he agent orchestration
…thin langgraph and qa possibility prompt enhancer agent
@CLAassistant
Copy link

CLAassistant commented Aug 15, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 3 committers have signed the CLA.

❌ MohsinAliIrfan
❌ itsareebalatif
❌ areeba-latif


areeba-latif seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

40 issues found across 81 files • Review in cubic

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

@@ -0,0 +1,46 @@
from fastapi import WebSocket, logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fastapi does not export logger, so this import will raise an ImportError at runtime; the correct path is fastapi.logger.logger

Prompt for AI agents
Address the following comment on src/websocket/websocket_manager.py at line 1:

<comment>fastapi does not export logger, so this import will raise an ImportError at runtime; the correct path is fastapi.logger.logger</comment>

<file context>
@@ -0,0 +1,46 @@
+from fastapi import WebSocket, logger
+from typing import List
+import asyncio
</file context>
Suggested change
from fastapi import WebSocket, logger
from fastapi.logger import logger


# Get page content and viewport size
content = await self.page.content()
viewport_size = await self.page.viewport_size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page.viewport_size() is a synchronous Playwright call; awaiting it will raise “TypeError: object PageViewportSizeResult can’t be used in ‘await’ expression”.

Prompt for AI agents
Address the following comment on src/browser_use/browser/dolphin_service.py at line 330:

<comment>Page.viewport_size() is a synchronous Playwright call; awaiting it will raise “TypeError: object PageViewportSizeResult can’t be used in ‘await’ expression”.</comment>

<file context>
@@ -0,0 +1,348 @@
+import logging
+import os
+
+import aiohttp
+from playwright.async_api import Page, async_playwright
+
+from browser_use.browser.service import Browser
+from browser_use.browser.views import BrowserState, TabInfo
+
</file context>

fonts-dejavu-core \
fonts-dejavu-extra \
vim \
# Video recording dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the previous line ("vim ") ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading "#" therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.

Prompt for AI agents
Address the following comment on Dockerfile at line 47:

<comment>Because the previous line (&quot;vim \&quot;) ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading &quot;#&quot; therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.</comment>

<file context>
@@ -44,6 +44,16 @@ RUN apt-get update &amp;&amp; apt-get install -y \
     fonts-dejavu-core \
     fonts-dejavu-extra \
     vim \
+    # Video recording dependencies
+    ffmpeg \
+    libavcodec-extra \
</file context>

chat_message = {
"role": "assistant",
"content": final_content.strip(), # Remove leading/trailing whitespace
async def run_agent_task(query: str, url: str, message_callback: Optional[Callable[[str], Awaitable[None]]] = None) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new run_agent_task signature no longer matches the existing call sites (e.g. handle_submit still calls run_agent_task(webui_manager, components)). At runtime this will raise TypeError: run_agent_task() takes 2 positional arguments but 3 were given, breaking the tab completely.

Prompt for AI agents
Address the following comment on src/webui/components/browser_use_agent_tab.py at line 355:

<comment>The new run_agent_task signature no longer matches the existing call sites (e.g. handle_submit still calls run_agent_task(webui_manager, components)).  At runtime this will raise `TypeError: run_agent_task() takes 2 positional arguments but 3 were given`, breaking the tab completely.</comment>

<file context>
@@ -65,732 +89,893 @@ async def _initialize_llm(
         )
         return None
 
-
-def _get_config_value(
-        webui_manager: WebuiManager,
-        comp_dict: Dict[gr.components.Component, Any],
-        comp_id_suffix: str,
-        default: Any = None,
</file context>

<span style="float:right; color:#28a745;">● Connected</span>
</div>
<iframe
src="http://localhost:6080/vnc.html?autoconnect=true&resize=scale&password=youvncpassword&autoconnect=true&resize=scale&quality=6&compression=6"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded credential ("youvncpassword") is embedded directly in the HTML iframe URL. Exposing passwords in source code is a security risk and violates secret-management best practices.

Prompt for AI agents
Address the following comment on src/webui/components/browser_use_agent_tab.py at line 1230:

<comment>Hard-coded credential (&quot;youvncpassword&quot;) is embedded directly in the HTML iframe URL.  Exposing passwords in source code is a security risk and violates secret-management best practices.</comment>

<file context>
@@ -996,23 +1197,48 @@ def create_browser_use_agent_tab(webui_manager: WebuiManager):
             interactive=True,
             elem_id=&quot;user_input&quot;,
         )
+        url_input = gr.Textbox(
+            label=&quot;Enter URL&quot;,
+            placeholder=&quot;Enter the URL to analyze (optional)&quot;,
+            lines=1,
+            interactive=True,
+            elem_id=&quot;url_input&quot;,
</file context>

y: int


class DragDropAction(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model allows creating a drag-drop action without specifying either element selectors or absolute coordinates, which would make the action impossible to execute. Add a validator to ensure that at least one pair of source/target references (selectors or coordinates) is provided.

Prompt for AI agents
Address the following comment on src/browser_use/controller/views.py at line 72:

<comment>The model allows creating a drag-drop action without specifying either element selectors or absolute coordinates, which would make the action impossible to execute. Add a validator to ensure that at least one pair of source/target references (selectors or coordinates) is provided.</comment>

<file context>
@@ -0,0 +1,91 @@
+from pydantic import BaseModel, ConfigDict, Field, model_validator
+
+
+# Action Input Models
+class SearchGoogleAction(BaseModel):
+	query: str
+
+
+class GoToUrlAction(BaseModel):
</file context>

```json
{
"agent_msg": "QA is possible on the extracted Snippet",
"qa_possibilty": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field name "qa_possibilty" is misspelled; the real schema uses "qa_possibility". Keeping the typo in examples will mislead developers and break JSON validation.

Prompt for AI agents
Address the following comment on src/agent/qa_possibilty_checker/README.md at line 22:

<comment>Field name &quot;qa_possibilty&quot; is misspelled; the real schema uses &quot;qa_possibility&quot;. Keeping the typo in examples will mislead developers and break JSON validation.</comment>

<file context>
@@ -0,0 +1,143 @@
+The **QA Possibility Checker Agent** is like a smart gatekeeper in our AI-based QA automation system. Its job is to check whether the test the user wants to perform can actually be done on the current webpage snippet.
+
+
+## What This Agent Does
+
+* When a user says something like:
+  **&quot;Check if the &#39;Submit&#39; button works&quot;**
+
+  And the webpage snippet actually includes a `&lt;button&gt;Submit&lt;/button&gt;` –
</file context>
Suggested change
"qa_possibilty": true
"qa_possibility": true

This file defines a simple Pydantic class that sets the structure for the agent's response:

```python
class QAPossibiltyChecker(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class name is misspelled ("QAPossibiltyChecker") and does not match the real implementation ("QAPossibilityCheckerOutput") inside src/agent/qa_possibilty_checker/output.py; this mismatch will confuse readers and lead to incorrect usage.

Prompt for AI agents
Address the following comment on src/agent/qa_possibilty_checker/README.md at line 93:

<comment>Class name is misspelled (&quot;QAPossibiltyChecker&quot;) and does not match the real implementation (&quot;QAPossibilityCheckerOutput&quot;) inside src/agent/qa_possibilty_checker/output.py; this mismatch will confuse readers and lead to incorrect usage.</comment>

<file context>
@@ -0,0 +1,143 @@
+The **QA Possibility Checker Agent** is like a smart gatekeeper in our AI-based QA automation system. Its job is to check whether the test the user wants to perform can actually be done on the current webpage snippet.
+
+
+## What This Agent Does
+
+* When a user says something like:
+  **&quot;Check if the &#39;Submit&#39; button works&quot;**
+
+  And the webpage snippet actually includes a `&lt;button&gt;Submit&lt;/button&gt;` –
</file context>
Suggested change
class QAPossibiltyChecker(BaseModel):
class QAPossibilityCheckerOutput(BaseModel):

# glob patterns are very easy to mess up and match too many domains by accident
# e.g. if you only need to access gmail, don't use *.google.com because an attacker could convince the agent to visit a malicious doc
# on docs.google.com/s/some/evil/doc to set up a prompt injection attack
"⚠️ Allowing agent to visit {domain} based on allowed_domains=['{glob}', ...]. Set allowed_domains=['{domain}', ...] explicitly to avoid the security risks of glob patterns!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message uses string formatting placeholders but is not declared as an f-string, so the variables domain and glob will not be interpolated and the warning will always show the literal braces. Use an f-string or .format() to include the actual values.

Prompt for AI agents
Address the following comment on src/browser_use/browser/context.py at line 963:

<comment>The log message uses string formatting placeholders but is not declared as an f-string, so the variables `domain` and `glob` will not be interpolated and the warning will always show the literal braces. Use an f-string or `.format()` to include the actual values.</comment>

<file context>
@@ -0,0 +1,2027 @@
+&quot;&quot;&quot;
+Playwright browser on steroids.
+&quot;&quot;&quot;
+
+import asyncio
+import base64
+import gc
+import json
+import logging
</file context>


class CustomBrowser(Browser):

def __init__(self, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor is overridden but adds no new behavior and accepts arbitrary *args/**kwargs, hiding the explicit config: BrowserConfig | None = None signature defined in the base class. This weakens static-type checking and makes the override redundant, reducing maintainability.

Prompt for AI agents
Address the following comment on src/browser/custom_browser.py at line 36:

<comment>The constructor is overridden but adds no new behavior and accepts arbitrary *args/**kwargs, hiding the explicit `config: BrowserConfig | None = None` signature defined in the base class. This weakens static-type checking and makes the override redundant, reducing maintainability.</comment>

<file context>
@@ -33,6 +33,9 @@
 
 class CustomBrowser(Browser):
 
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants