Feat/video issue solved #679

ghost · 2025-08-15T13:20:54Z

Summary by cubic

Added a live browser testing workflow with real-time frontend viewing, new API endpoints, and agent orchestration for SQA automation. Integrated video recording, local LLM support (Ollama, Gemma), enhanced agent logic for prompt and QA possibility checking, and improved Docker deployment.

New Features
Live browser automation view in the frontend with instant feedback.
FastAPI backend with WebSocket messaging and new endpoints.
Output data and videos saved locally and exposed to UI.
Enhanced agents for intent classification, prompt improvement, and QA possibility.
Support for Ollama and Gemma local models.
Simple HTML and Gradio UIs provided.
Migration
Update Docker image and use uvicorn src.API.main:app for web API.
Mount src/outputdata volume to access agent data and recordings.
Start with python start_live_browser.py or Docker Compose for full workflow.

Langgraph structure change

…d dealt with max iterations

…ations Enhanced Prompt enahncer agent to enahnce the prompts acciordingly an…

…ations Changed the bumber of max steps

…ing displayed on the frontend

…r completelty stop when the user ask about testing the UI/animation

Enhancing intent classifier

…animation testing

Modified the prompt enhancer agents promp to not to entertain the ui/…

Integrated ollama locally to run llm locally

Local gemma model in ollama integratuion

Generation of video locally

…ated to UI/animation and respective changes int he agent orchestration

…thin langgraph and qa possibility prompt enhancer agent

…nd make changes.txt file

…e ui and debugged the intent classifier function

…n, debugged screeshot noe being taken

…lity within the QA possibility agent

CLAassistant · 2025-08-15T13:21:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 3 committers have signed the CLA.

❌ areeba-latif
❌ MohsinAliIrfan
❌ itsareebalatif

areeba-latif seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

cubic-dev-ai

cubic analysis

40 issues found across 81 files • Review in cubic

_{React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-ai · 2025-08-15T13:32:14Z

src/websocket/websocket_manager.py

@@ -0,0 +1,46 @@
+from fastapi import WebSocket, logger


fastapi does not export logger, so this import will raise an ImportError at runtime; the correct path is fastapi.logger.logger

Prompt for AI agents

Address the following comment on src/websocket/websocket_manager.py at line 1: <comment>fastapi does not export logger, so this import will raise an ImportError at runtime; the correct path is fastapi.logger.logger</comment> <file context> @@ -0,0 +1,46 @@ +from fastapi import WebSocket, logger +from typing import List +import asyncio </file context>

Suggested change

from fastapi import WebSocket, logger

from fastapi.logger import logger

cubic-dev-ai · 2025-08-15T13:32:15Z

src/browser_use/browser/dolphin_service.py

+
+		# Get page content and viewport size
+		content = await self.page.content()
+		viewport_size = await self.page.viewport_size()


Page.viewport_size() is a synchronous Playwright call; awaiting it will raise “TypeError: object PageViewportSizeResult can’t be used in ‘await’ expression”.

Prompt for AI agents

Address the following comment on src/browser_use/browser/dolphin_service.py at line 330: <comment>Page.viewport_size() is a synchronous Playwright call; awaiting it will raise “TypeError: object PageViewportSizeResult can’t be used in ‘await’ expression”.</comment> <file context> @@ -0,0 +1,348 @@ +import logging +import os + +import aiohttp +from playwright.async_api import Page, async_playwright + +from browser_use.browser.service import Browser +from browser_use.browser.views import BrowserState, TabInfo + </file context>

cubic-dev-ai · 2025-08-15T13:32:15Z

Dockerfile

    fonts-dejavu-core \
    fonts-dejavu-extra \
    vim \
+    # Video recording dependencies


Because the previous line ("vim ") ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading "#" therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.

Prompt for AI agents

Address the following comment on Dockerfile at line 47: <comment>Because the previous line ("vim \") ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading "#" therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.</comment> <file context> @@ -44,6 +44,16 @@ RUN apt-get update && apt-get install -y \ fonts-dejavu-core \ fonts-dejavu-extra \ vim \ + # Video recording dependencies + ffmpeg \ + libavcodec-extra \ </file context>

cubic-dev-ai · 2025-08-15T13:32:15Z

src/webui/components/browser_use_agent_tab.py

-    chat_message = {
-        "role": "assistant",
-        "content": final_content.strip(),  # Remove leading/trailing whitespace
+async def run_agent_task(query: str, url: str, message_callback: Optional[Callable[[str], Awaitable[None]]] = None) -> Dict[str, Any]:


The new run_agent_task signature no longer matches the existing call sites (e.g. handle_submit still calls run_agent_task(webui_manager, components)). At runtime this will raise TypeError: run_agent_task() takes 2 positional arguments but 3 were given, breaking the tab completely.

Prompt for AI agents

Address the following comment on src/webui/components/browser_use_agent_tab.py at line 355: <comment>The new run_agent_task signature no longer matches the existing call sites (e.g. handle_submit still calls run_agent_task(webui_manager, components)). At runtime this will raise `TypeError: run_agent_task() takes 2 positional arguments but 3 were given`, breaking the tab completely.</comment> <file context> @@ -65,732 +89,893 @@ async def _initialize_llm( ) return None - -def _get_config_value( - webui_manager: WebuiManager, - comp_dict: Dict[gr.components.Component, Any], - comp_id_suffix: str, - default: Any = None, </file context>

cubic-dev-ai · 2025-08-15T13:32:15Z

src/webui/components/browser_use_agent_tab.py

+                    <span style="float:right; color:#28a745;">● Connected</span>
+                </div>
+                <iframe 
+                    src="http://localhost:6080/vnc.html?autoconnect=true&resize=scale&password=youvncpassword&autoconnect=true&resize=scale&quality=6&compression=6" 


Hard-coded credential ("youvncpassword") is embedded directly in the HTML iframe URL. Exposing passwords in source code is a security risk and violates secret-management best practices.

Prompt for AI agents

Address the following comment on src/webui/components/browser_use_agent_tab.py at line 1230: <comment>Hard-coded credential ("youvncpassword") is embedded directly in the HTML iframe URL. Exposing passwords in source code is a security risk and violates secret-management best practices.</comment> <file context> @@ -996,23 +1197,48 @@ def create_browser_use_agent_tab(webui_manager: WebuiManager): interactive=True, elem_id="user_input", ) + url_input = gr.Textbox( + label="Enter URL", + placeholder="Enter the URL to analyze (optional)", + lines=1, + interactive=True, + elem_id="url_input", </file context>

cubic-dev-ai · 2025-08-15T13:32:18Z

src/browser_use/controller/views.py

+	y: int
+
+
+class DragDropAction(BaseModel):


The model allows creating a drag-drop action without specifying either element selectors or absolute coordinates, which would make the action impossible to execute. Add a validator to ensure that at least one pair of source/target references (selectors or coordinates) is provided.

Prompt for AI agents

Address the following comment on src/browser_use/controller/views.py at line 72: <comment>The model allows creating a drag-drop action without specifying either element selectors or absolute coordinates, which would make the action impossible to execute. Add a validator to ensure that at least one pair of source/target references (selectors or coordinates) is provided.</comment> <file context> @@ -0,0 +1,91 @@ +from pydantic import BaseModel, ConfigDict, Field, model_validator + + +# Action Input Models +class SearchGoogleAction(BaseModel): + query: str + + +class GoToUrlAction(BaseModel): </file context>

cubic-dev-ai · 2025-08-15T13:32:18Z

src/agent/qa_possibilty_checker/README.md

+```json
+{
+  "agent_msg": "QA is possible on the extracted Snippet",
+  "qa_possibilty": true


Field name "qa_possibilty" is misspelled; the real schema uses "qa_possibility". Keeping the typo in examples will mislead developers and break JSON validation.

Prompt for AI agents

Address the following comment on src/agent/qa_possibilty_checker/README.md at line 22: <comment>Field name "qa_possibilty" is misspelled; the real schema uses "qa_possibility". Keeping the typo in examples will mislead developers and break JSON validation.</comment> <file context> @@ -0,0 +1,143 @@ +The **QA Possibility Checker Agent** is like a smart gatekeeper in our AI-based QA automation system. Its job is to check whether the test the user wants to perform can actually be done on the current webpage snippet. + + +## What This Agent Does + +* When a user says something like: + **"Check if the 'Submit' button works"** + + And the webpage snippet actually includes a `<button>Submit</button>` – </file context>

Suggested change

"qa_possibilty": true

"qa_possibility": true

cubic-dev-ai · 2025-08-15T13:32:19Z

src/agent/qa_possibilty_checker/README.md

+This file defines a simple Pydantic class that sets the structure for the agent's response:
+
+```python
+class QAPossibiltyChecker(BaseModel):


Class name is misspelled ("QAPossibiltyChecker") and does not match the real implementation ("QAPossibilityCheckerOutput") inside src/agent/qa_possibilty_checker/output.py; this mismatch will confuse readers and lead to incorrect usage.

Prompt for AI agents

Address the following comment on src/agent/qa_possibilty_checker/README.md at line 93: <comment>Class name is misspelled ("QAPossibiltyChecker") and does not match the real implementation ("QAPossibilityCheckerOutput") inside src/agent/qa_possibilty_checker/output.py; this mismatch will confuse readers and lead to incorrect usage.</comment> <file context> @@ -0,0 +1,143 @@ +The **QA Possibility Checker Agent** is like a smart gatekeeper in our AI-based QA automation system. Its job is to check whether the test the user wants to perform can actually be done on the current webpage snippet. + + +## What This Agent Does + +* When a user says something like: + **"Check if the 'Submit' button works"** + + And the webpage snippet actually includes a `<button>Submit</button>` – </file context>

Suggested change

class QAPossibiltyChecker(BaseModel):

class QAPossibilityCheckerOutput(BaseModel):

cubic-dev-ai · 2025-08-15T13:32:19Z

src/browser_use/browser/context.py

+					# glob patterns are very easy to mess up and match too many domains by accident
+					# e.g. if you only need to access gmail, don't use *.google.com because an attacker could convince the agent to visit a malicious doc
+					# on docs.google.com/s/some/evil/doc to set up a prompt injection attack
+					"⚠️ Allowing agent to visit {domain} based on allowed_domains=['{glob}', ...]. Set allowed_domains=['{domain}', ...] explicitly to avoid the security risks of glob patterns!"


The log message uses string formatting placeholders but is not declared as an f-string, so the variables domain and glob will not be interpolated and the warning will always show the literal braces. Use an f-string or .format() to include the actual values.

Prompt for AI agents

Address the following comment on src/browser_use/browser/context.py at line 963: <comment>The log message uses string formatting placeholders but is not declared as an f-string, so the variables `domain` and `glob` will not be interpolated and the warning will always show the literal braces. Use an f-string or `.format()` to include the actual values.</comment> <file context> @@ -0,0 +1,2027 @@ +""" +Playwright browser on steroids. +""" + +import asyncio +import base64 +import gc +import json +import logging </file context>

cubic-dev-ai · 2025-08-15T13:32:19Z

src/browser/custom_browser.py


 class CustomBrowser(Browser):

+    def __init__(self, *args, **kwargs):


The constructor is overridden but adds no new behavior and accepts arbitrary *args/**kwargs, hiding the explicit config: BrowserConfig | None = None signature defined in the base class. This weakens static-type checking and makes the override redundant, reducing maintainability.

Prompt for AI agents

Address the following comment on src/browser/custom_browser.py at line 36: <comment>The constructor is overridden but adds no new behavior and accepts arbitrary *args/**kwargs, hiding the explicit `config: BrowserConfig | None = None` signature defined in the base class. This weakens static-type checking and makes the override redundant, reducing maintainability.</comment> <file context> @@ -33,6 +33,9 @@ class CustomBrowser(Browser): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + </file context>

…oving the unused stop api

MohsinAliIrfan and others added 30 commits June 10, 2025 20:22

Added external agents and changed input to the deeo_research

be15899

Changed the structore of agents, merged them and made them work

f1d1447

Merge pull request #2 from MohsinAliIrfan/LanggraphStructureChange

d558663

Langgraph structure change

Enhanced Prompt enahncer agent to enahnce the prompts acciordingly an…

c2ba960

…d dealt with max iterations

Merge pull request #3 from MohsinAliIrfan/DeebugginAgents_and_maxiter…

0cb3377

…ations Enhanced Prompt enahncer agent to enahnce the prompts acciordingly an…

Changed the bumber of max steps

c3fb5b0

Merge pull request #4 from MohsinAliIrfan/DeebugginAgents_and_maxiter…

b300d83

…ations Changed the bumber of max steps

changed the local savbing of agents history

d3ecaaa

Agents response saving locally along with the screenshots and data be…

ebd65ec

…ing displayed on the frontend

Changed the Intent classifier such that it either display a warning o…

2557d8e

…r completelty stop when the user ask about testing the UI/animation

Merge pull request #5 from MohsinAliIrfan/EnhancingIntentClassifier

f18a973

Enhancing intent classifier

Modified the prompt enhancer agents promp to not to entertain the ui/…

c912165

…animation testing

Merge pull request #6 from MohsinAliIrfan/Modified_PromptEnhancer

fd4f23b

Modified the prompt enhancer agents promp to not to entertain the ui/…

Integrated ollama locally to run llm locally

e9cd9dd

Merge pull request #7 from MohsinAliIrfan/LocalOllamaIntegration

fa94679

Integrated ollama locally to run llm locally

Local gemma model in ollama integratuion

46658c0

Merge pull request #8 from MohsinAliIrfan/LocalOllamaIntegration

1f37ba3

Local gemma model in ollama integratuion

Generation of video locally

fa700d5

Merge pull request #9 from MohsinAliIrfan/Feature_VideoGen

8924753

Generation of video locally

Merge remote-tracking branch 'upstream/main'

1bdd520

Debugged the webpage checker issue

7ccfebd

Changed the out structure and debugged the issue

8058aad

Modified and debugged the agent output structure issue

dbf5ab7

Modified the intent classifier agent to also modify the prompt if rel…

6995325

…ated to UI/animation and respective changes int he agent orchestration

Implementation of vision model for QA possibilty generator

9a125cc

Implementation of vision model for QA possibilty generator

ae17616

Changes in the structure of the agentic workflow and vision model wii…

0404433

…thin langgraph and qa possibility prompt enhancer agent

🔧 Update: run_agent_task ,create API and new frontend

2f79e27

updates by Areeba in browser-use-web-ui

8804983

Changes to push everyting within the submodule

21d2344

itsareebalatif and others added 13 commits July 28, 2025 13:46

Update intent classifier and QA checker and prompt Enahncer ouput , a…

65cb10f

…nd make changes.txt file

new changes

4452057

Displayed the video within the UI, dockerized the api and modified th…

f354be4

…e ui and debugged the intent classifier function

Add problematic SSH key files to .gitignore and remove from tracking

b1b2c38

HTML file with new live browser view

b2d3902

Resolved Path issues, docker issues, and other bugs

dff3e76

Implemented Web Socket

d739cdd

Increased number of steps. changes in the prompt, gpt-5 implementatio…

4aff463

…n, debugged screeshot noe being taken

Elimated the use of Intewnt classifier agent and merged the functiona…

6d8b587

…lity within the QA possibility agent

Debugged the api and dodkcer file

7bfa83b

schmea

30da510

video are saved in orderd folder and merged video is creating

d5767e9

new changes

c3a9f3f

cubic-dev-ai bot reviewed Aug 15, 2025

View reviewed changes

MohsinAliIrfan and others added 5 commits August 15, 2025 20:36

Debugged Docker

7c7998b

env issues not mounting within the container

f16937f

Gertting the entire output from agent inlcluding all the sets and rem…

09dfe01

…oving the unused stop api

port issue resolved for swagger and got rid of all the unused apis

94886ec

Merge branch 'main' into feat/video_issue_solved

49b23af

	from fastapi import WebSocket, logger
	from fastapi.logger import logger

	class QAPossibiltyChecker(BaseModel):
	class QAPossibilityCheckerOutput(BaseModel):


		class CustomBrowser(Browser):

		def __init__(self, args, *kwargs):

Feat/video issue solved #679

Are you sure you want to change the base?

Feat/video issue solved #679

Uh oh!

Conversation

ghost commented Aug 15, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

CLAassistant commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

cubic analysis

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ghost commented Aug 15, 2025 •

edited by cubic-dev-ai bot

Loading

CLAassistant commented Aug 15, 2025 •

edited

Loading