-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Feat/video issue solved #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ghost
wants to merge
48
commits into
browser-use:main
Choose a base branch
from
codeupscale:feat/video_issue_solved
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 41 commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
be15899
Added external agents and changed input to the deeo_research
MohsinAliIrfan f1d1447
Changed the structore of agents, merged them and made them work
MohsinAliIrfan d558663
Merge pull request #2 from MohsinAliIrfan/LanggraphStructureChange
MohsinAliIrfan c2ba960
Enhanced Prompt enahncer agent to enahnce the prompts acciordingly an…
MohsinAliIrfan 0cb3377
Merge pull request #3 from MohsinAliIrfan/DeebugginAgents_and_maxiter…
MohsinAliIrfan c3fb5b0
Changed the bumber of max steps
MohsinAliIrfan b300d83
Merge pull request #4 from MohsinAliIrfan/DeebugginAgents_and_maxiter…
MohsinAliIrfan d3ecaaa
changed the local savbing of agents history
MohsinAliIrfan ebd65ec
Agents response saving locally along with the screenshots and data be…
MohsinAliIrfan 2557d8e
Changed the Intent classifier such that it either display a warning o…
MohsinAliIrfan f18a973
Merge pull request #5 from MohsinAliIrfan/EnhancingIntentClassifier
MohsinAliIrfan c912165
Modified the prompt enhancer agents promp to not to entertain the ui/…
MohsinAliIrfan fd4f23b
Merge pull request #6 from MohsinAliIrfan/Modified_PromptEnhancer
MohsinAliIrfan e9cd9dd
Integrated ollama locally to run llm locally
MohsinAliIrfan fa94679
Merge pull request #7 from MohsinAliIrfan/LocalOllamaIntegration
MohsinAliIrfan 46658c0
Local gemma model in ollama integratuion
MohsinAliIrfan 1f37ba3
Merge pull request #8 from MohsinAliIrfan/LocalOllamaIntegration
MohsinAliIrfan fa700d5
Generation of video locally
MohsinAliIrfan 8924753
Merge pull request #9 from MohsinAliIrfan/Feature_VideoGen
MohsinAliIrfan 1bdd520
Merge remote-tracking branch 'upstream/main'
MohsinAliIrfan 7ccfebd
Debugged the webpage checker issue
MohsinAliIrfan 8058aad
Changed the out structure and debugged the issue
MohsinAliIrfan dbf5ab7
Modified and debugged the agent output structure issue
MohsinAliIrfan 6995325
Modified the intent classifier agent to also modify the prompt if rel…
MohsinAliIrfan 9a125cc
Implementation of vision model for QA possibilty generator
MohsinAliIrfan ae17616
Implementation of vision model for QA possibilty generator
MohsinAliIrfan 0404433
Changes in the structure of the agentic workflow and vision model wii…
MohsinAliIrfan 2f79e27
🔧 Update: run_agent_task ,create API and new frontend
itsareebalatif 8804983
updates by Areeba in browser-use-web-ui
itsareebalatif 21d2344
Changes to push everyting within the submodule
itsareebalatif 65cb10f
Update intent classifier and QA checker and prompt Enahncer ouput , a…
itsareebalatif 4452057
new changes
itsareebalatif f354be4
Displayed the video within the UI, dockerized the api and modified th…
MohsinAliIrfan b1b2c38
Add problematic SSH key files to .gitignore and remove from tracking
MohsinAliIrfan b2d3902
HTML file with new live browser view
MohsinAliIrfan dff3e76
Resolved Path issues, docker issues, and other bugs
MohsinAliIrfan d739cdd
Implemented Web Socket
itsareebalatif 4aff463
Increased number of steps. changes in the prompt, gpt-5 implementatio…
MohsinAliIrfan 6d8b587
Elimated the use of Intewnt classifier agent and merged the functiona…
MohsinAliIrfan 7bfa83b
Debugged the api and dodkcer file
MohsinAliIrfan 30da510
schmea
MohsinAliIrfan d5767e9
video are saved in orderd folder and merged video is creating
itsareebalatif c3a9f3f
new changes
itsareebalatif 7c7998b
Debugged Docker
MohsinAliIrfan f16937f
env issues not mounting within the container
MohsinAliIrfan 09dfe01
Gertting the entire output from agent inlcluding all the sets and rem…
MohsinAliIrfan 94886ec
port issue resolved for swagger and got rid of all the unused apis
MohsinAliIrfan 49b23af
Merge branch 'main' into feat/video_issue_solved
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
# 🧪 Live Browser Testing Agent | ||
|
||
This system now supports **real-time browser automation viewing** directly in your frontend! Instead of watching video recordings after the fact, you can see the browser automation happening live as it occurs. | ||
|
||
## 🎯 What's New | ||
|
||
### ✅ **Live Browser Viewing** | ||
- **Real-time automation** visible in your frontend | ||
- **Live mouse movements** and clicks | ||
- **Page navigation** and form filling | ||
- **Immediate feedback** when agent starts working | ||
- **Final browser state** remains visible after completion | ||
|
||
### ✅ **Two Interface Options** | ||
1. **Simple HTML Frontend** (`static/index.html`) - Clean, focused interface | ||
2. **Gradio WebUI** (`http://localhost:7788`) - Full-featured interface with live VNC | ||
|
||
## 🚀 Quick Start | ||
|
||
### Option 1: Automated Setup (Recommended) | ||
```bash | ||
# Run the startup script | ||
python start_live_browser.py | ||
``` | ||
|
||
### Option 2: Manual Setup | ||
```bash | ||
# Start Docker with VNC support | ||
docker compose up --build | ||
|
||
# Wait for services to start (about 30 seconds) | ||
# Then access your interface | ||
``` | ||
|
||
## 📱 Access Your Applications | ||
|
||
| Service | URL | Purpose | | ||
|---------|-----|---------| | ||
| **Simple Frontend** | `http://localhost:7788` | Clean HTML interface | | ||
| **Gradio WebUI** | `http://localhost:7788` | Full-featured interface | | ||
| **VNC Viewer** | `http://localhost:6080/vnc.html` | Direct VNC access | | ||
| **VNC Password** | `youvncpassword` | Default password | | ||
|
||
## 🎨 How It Works | ||
|
||
### **Before Agent Runs:** | ||
- Clean browser window (empty or showing your app) | ||
- Status indicator showing "Ready" | ||
|
||
### **During Agent Execution:** | ||
- **Real-time browser automation** happening right in the UI | ||
- **Live mouse movements** and clicks | ||
- **Page navigation** and form filling | ||
- **Screenshot updates** as the agent works | ||
|
||
### **After Agent Completes:** | ||
- Final state of the browser | ||
- Results visible in the browser | ||
- Status showing "Completed" | ||
|
||
## 🔧 Technical Details | ||
|
||
### **VNC Architecture** | ||
``` | ||
User Frontend → VNC Viewer (port 6080) → VNC Server (port 5901) → Virtual Display (:99) → Browser | ||
``` | ||
|
||
### **Components** | ||
- **Xvfb**: Virtual display server (`:99`) | ||
- **x11vnc**: VNC server sharing the virtual display | ||
- **noVNC**: Web-based VNC client | ||
- **Supervisor**: Manages all services | ||
|
||
### **Browser Configuration** | ||
- **Headless**: `False` (browser visible for VNC) | ||
- **Window Size**: 1280x1100 | ||
- **Display**: `:99` (virtual display) | ||
|
||
## 🛠️ Troubleshooting | ||
|
||
### **VNC Not Showing** | ||
1. Check if Docker is running: `docker ps` | ||
2. Verify VNC service: `docker logs <container_name>` | ||
3. Check ports: `netstat -an | grep 6080` | ||
|
||
### **Browser Not Visible** | ||
1. Ensure `headless=False` in browser config | ||
2. Check if virtual display is working | ||
3. Verify VNC connection | ||
|
||
### **Performance Issues** | ||
1. Reduce VNC quality settings | ||
2. Increase Docker memory allocation | ||
3. Close unnecessary browser tabs | ||
|
||
## 🔒 Security Notes | ||
|
||
- **VNC Password**: Change default password in `.env` file | ||
- **Network Access**: VNC is only accessible on localhost by default | ||
- **Browser Isolation**: Each session runs in isolated container | ||
|
||
## 📝 Configuration | ||
|
||
### **Environment Variables** | ||
```bash | ||
# VNC Settings | ||
VNC_PASSWORD=your_custom_password | ||
RESOLUTION=1920x1080x24 | ||
|
||
# Browser Settings | ||
DISPLAY=:99 | ||
USE_OWN_BROWSER=false | ||
KEEP_BROWSER_OPEN=true | ||
``` | ||
|
||
### **Custom VNC Settings** | ||
Edit `supervisord.conf` to modify: | ||
- VNC port (default: 5901) | ||
- Display resolution | ||
- Authentication settings | ||
|
||
## 🎯 Usage Examples | ||
|
||
### **Simple Test** | ||
1. Open `http://localhost:7788` | ||
2. Enter query: "Click the login button" | ||
3. Enter URL: "https://example.com" | ||
4. Click "Start Live Test" | ||
5. Watch the browser automation happen live! | ||
|
||
### **Complex Workflow** | ||
1. Start with simple navigation | ||
2. Watch form filling in real-time | ||
3. See error handling and retries | ||
4. Observe final state and results | ||
|
||
## 🚀 Advanced Features | ||
|
||
### **Multiple Browser Sessions** | ||
- Each test runs in isolated browser context | ||
- No interference between concurrent tests | ||
- Clean state for each automation | ||
|
||
### **Debugging Support** | ||
- Live view helps identify automation issues | ||
- Real-time feedback on agent decisions | ||
- Visual confirmation of actions | ||
|
||
### **Integration Options** | ||
- Embed VNC viewer in any web application | ||
- Customize VNC viewer appearance | ||
- Add status indicators and controls | ||
|
||
## 📞 Support | ||
|
||
If you encounter issues: | ||
1. Check the Docker logs: `docker compose logs` | ||
2. Verify all services are running | ||
3. Ensure ports are not blocked | ||
4. Check browser console for errors | ||
|
||
--- | ||
|
||
**🎉 Enjoy your live browser automation experience!** |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
browser-use-agent-tab.py | ||
------------------------- | ||
- Removed all extra/unused variables. | ||
- Rewrote run_agent_task() without using gradio, webui_manager, or other UI dependencies. | ||
- Commented out all unused functions: | ||
- pause_button | ||
- resume_button | ||
- _ask_assistant_callback | ||
- handle_done | ||
- handle_new_step | ||
- _get_config_value | ||
- _format_agent_output | ||
- Created a FastAPI endpoint to run main-agent-task. | ||
- Created a "static/" folder containing a simple UI for "Website Testing Agent". | ||
|
||
browser_recorder.py | ||
-------------------- | ||
- Cleans the entire video directory before starting a new recording session. | ||
- Uses glob.glob() to recursively find .webm files after context closure. | ||
- Does not use page.on() or page.video.start(); relies on Playwright's built-in recording mechanism. | ||
- Stores just the filenames in a list: self.recorded_videos. | ||
|
||
agent/mainagent.py | ||
------------------- | ||
- Modified loop logic to ensure the agent runs correctly only when required. | ||
|
||
agent/qa_possibility_checker/ | ||
------------------------------ | ||
- Updated prompt.py with refined prompt structure. | ||
- Added custom_validate() function in output.py for validating the agent output. | ||
|
||
agent/intent_classifier/ | ||
------------------------------ | ||
- Added custom_validate() function in output.py for validating the agent output. | ||
|
||
agent/prompt_enhancer/ | ||
------------------------------ | ||
- Added custom_validate() function in output.py for validating the agent output. | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
from fastapi import FastAPI, WebSocket, WebSocketDisconnect, HTTPException | ||
from fastapi.staticfiles import StaticFiles | ||
from fastapi.responses import FileResponse | ||
from fastapi.middleware.cors import CORSMiddleware | ||
from pydantic import BaseModel | ||
from pathlib import Path | ||
import os | ||
|
||
from src.webui.components.browser_use_agent_tab import run_agent_task | ||
from src.websocket.websocket_manager import WebSocketManager | ||
|
||
app = FastAPI() | ||
app.add_middleware( | ||
CORSMiddleware, | ||
allow_origins=["*"], | ||
allow_credentials=True, | ||
allow_methods=["*"], | ||
allow_headers=["*"], | ||
) | ||
manager = WebSocketManager() | ||
|
||
# Mount static files | ||
app.mount("/static", StaticFiles(directory=os.getcwd()), name="static") | ||
|
||
# Set display environment for Docker (headless) | ||
if not os.getenv("DISPLAY"): | ||
os.environ["DISPLAY"] = ":99" | ||
|
||
|
||
# 🧠 Request body model | ||
class AgentRequest(BaseModel): | ||
query: str | ||
url: str | ||
|
||
|
||
# 🎯 Run agent task and send logs via WebSocket | ||
@app.post("/run-agent") | ||
|
||
async def run_agent(request: AgentRequest): | ||
try: | ||
print(f"🔄 Starting agent with DISPLAY={os.getenv('DISPLAY')}") | ||
|
||
async def message_callback(message: str): | ||
await manager.send_message(message) | ||
|
||
result = await run_agent_task(request.query, request.url, message_callback=message_callback) | ||
|
||
return { | ||
"status": "success", | ||
"task_id": result["task_id"], | ||
"final_result": result["final_result"] | ||
} | ||
except Exception as e: | ||
print(f"❌ Agent error: {e}") | ||
raise HTTPException(status_code=500, detail=str(e)) | ||
|
||
|
||
# ⛔ Optional Stop Agent | ||
@app.post("/stop-agent") | ||
def stop_agent(): | ||
return {"status": "stopped"} | ||
|
||
|
||
# 🌐 Serve frontend | ||
@app.get("/") | ||
async def serve_frontend(): | ||
return FileResponse("static/index.html") | ||
|
||
|
||
@app.websocket("/ws") | ||
async def websocket_endpoint(websocket: WebSocket): | ||
await manager.connect(websocket) | ||
try: | ||
while True: | ||
await websocket.receive_text() # keep alive | ||
except: | ||
manager.disconnect(websocket) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the previous line ("vim ") ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading "#" therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.
Prompt for AI agents