Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
be15899
Added external agents and changed input to the deeo_research
MohsinAliIrfan Jun 10, 2025
f1d1447
Changed the structore of agents, merged them and made them work
MohsinAliIrfan Jun 11, 2025
d558663
Merge pull request #2 from MohsinAliIrfan/LanggraphStructureChange
MohsinAliIrfan Jun 11, 2025
c2ba960
Enhanced Prompt enahncer agent to enahnce the prompts acciordingly an…
MohsinAliIrfan Jun 12, 2025
0cb3377
Merge pull request #3 from MohsinAliIrfan/DeebugginAgents_and_maxiter…
MohsinAliIrfan Jun 12, 2025
c3fb5b0
Changed the bumber of max steps
MohsinAliIrfan Jun 12, 2025
b300d83
Merge pull request #4 from MohsinAliIrfan/DeebugginAgents_and_maxiter…
MohsinAliIrfan Jun 12, 2025
d3ecaaa
changed the local savbing of agents history
MohsinAliIrfan Jun 12, 2025
ebd65ec
Agents response saving locally along with the screenshots and data be…
MohsinAliIrfan Jun 13, 2025
2557d8e
Changed the Intent classifier such that it either display a warning o…
MohsinAliIrfan Jun 16, 2025
f18a973
Merge pull request #5 from MohsinAliIrfan/EnhancingIntentClassifier
MohsinAliIrfan Jun 16, 2025
c912165
Modified the prompt enhancer agents promp to not to entertain the ui/…
MohsinAliIrfan Jun 16, 2025
fd4f23b
Merge pull request #6 from MohsinAliIrfan/Modified_PromptEnhancer
MohsinAliIrfan Jun 16, 2025
e9cd9dd
Integrated ollama locally to run llm locally
MohsinAliIrfan Jun 16, 2025
fa94679
Merge pull request #7 from MohsinAliIrfan/LocalOllamaIntegration
MohsinAliIrfan Jun 16, 2025
46658c0
Local gemma model in ollama integratuion
MohsinAliIrfan Jun 17, 2025
1f37ba3
Merge pull request #8 from MohsinAliIrfan/LocalOllamaIntegration
MohsinAliIrfan Jun 17, 2025
fa700d5
Generation of video locally
MohsinAliIrfan Jun 18, 2025
8924753
Merge pull request #9 from MohsinAliIrfan/Feature_VideoGen
MohsinAliIrfan Jun 18, 2025
1bdd520
Merge remote-tracking branch 'upstream/main'
MohsinAliIrfan Jun 19, 2025
7ccfebd
Debugged the webpage checker issue
MohsinAliIrfan Jun 19, 2025
8058aad
Changed the out structure and debugged the issue
MohsinAliIrfan Jun 20, 2025
dbf5ab7
Modified and debugged the agent output structure issue
MohsinAliIrfan Jun 20, 2025
6995325
Modified the intent classifier agent to also modify the prompt if rel…
MohsinAliIrfan Jun 20, 2025
9a125cc
Implementation of vision model for QA possibilty generator
MohsinAliIrfan Jun 23, 2025
ae17616
Implementation of vision model for QA possibilty generator
MohsinAliIrfan Jun 23, 2025
0404433
Changes in the structure of the agentic workflow and vision model wii…
MohsinAliIrfan Jun 23, 2025
2f79e27
🔧 Update: run_agent_task ,create API and new frontend
itsareebalatif Jul 8, 2025
8804983
updates by Areeba in browser-use-web-ui
itsareebalatif Jul 9, 2025
21d2344
Changes to push everyting within the submodule
itsareebalatif Jul 9, 2025
65cb10f
Update intent classifier and QA checker and prompt Enahncer ouput , a…
itsareebalatif Jul 28, 2025
4452057
new changes
itsareebalatif Jul 29, 2025
f354be4
Displayed the video within the UI, dockerized the api and modified th…
MohsinAliIrfan Jul 29, 2025
b1b2c38
Add problematic SSH key files to .gitignore and remove from tracking
MohsinAliIrfan Jul 30, 2025
b2d3902
HTML file with new live browser view
MohsinAliIrfan Jul 30, 2025
dff3e76
Resolved Path issues, docker issues, and other bugs
MohsinAliIrfan Aug 4, 2025
d739cdd
Implemented Web Socket
itsareebalatif Aug 8, 2025
4aff463
Increased number of steps. changes in the prompt, gpt-5 implementatio…
MohsinAliIrfan Aug 8, 2025
6d8b587
Elimated the use of Intewnt classifier agent and merged the functiona…
MohsinAliIrfan Aug 11, 2025
7bfa83b
Debugged the api and dodkcer file
MohsinAliIrfan Aug 11, 2025
30da510
schmea
MohsinAliIrfan Aug 12, 2025
d5767e9
video are saved in orderd folder and merged video is creating
itsareebalatif Aug 13, 2025
c3a9f3f
new changes
itsareebalatif Aug 15, 2025
7c7998b
Debugged Docker
MohsinAliIrfan Aug 15, 2025
f16937f
env issues not mounting within the container
MohsinAliIrfan Aug 18, 2025
09dfe01
Gertting the entire output from agent inlcluding all the sets and rem…
MohsinAliIrfan Aug 18, 2025
94886ec
port issue resolved for swagger and got rid of all the unused apis
MohsinAliIrfan Aug 18, 2025
49b23af
Merge branch 'main' into feat/video_issue_solved
Aug 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,17 @@ data
tmp
results

.env
.env
.venv/
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
*.db
*.sqlite3
*.log
*.mp4
*.avi
*.mkv
*.webm
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -189,4 +189,11 @@ data/
.config.pkl
*.pdf

workflow
workflow.env
.venv/



#ssh keys poublic and private
"eval \"$(ssh-agent -s)\""
"eval \"$(ssh-agent -s)\".pub"
19 changes: 16 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ RUN apt-get update && apt-get install -y \
curl \
unzip \
xvfb \
libgconf-2-4 \
# libgconf-2-4 \
libxss1 \
libnss3 \
libnspr4 \
Expand Down Expand Up @@ -44,6 +44,16 @@ RUN apt-get update && apt-get install -y \
fonts-dejavu-core \
fonts-dejavu-extra \
vim \
# Video recording dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the previous line ("vim ") ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading "#" therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.

Prompt for AI agents
Address the following comment on Dockerfile at line 47:

<comment>Because the previous line (&quot;vim \&quot;) ends with a line-continuation back-slash, this comment becomes part of the same shell command executed by /bin/sh. The leading &quot;#&quot; therefore starts a shell comment inside the continued line, so every token that follows (ffmpeg, libavcodec-extra, …) is ignored. As a result none of the intended video packages are installed, breaking the new feature.</comment>

<file context>
@@ -44,6 +44,16 @@ RUN apt-get update &amp;&amp; apt-get install -y \
     fonts-dejavu-core \
     fonts-dejavu-extra \
     vim \
+    # Video recording dependencies
+    ffmpeg \
+    libavcodec-extra \
</file context>

ffmpeg \
libavcodec-extra \
libavformat-dev \
libavutil-dev \
libswscale-dev \
libx264-dev \
libx265-dev \
libvpx-dev \
libwebp-dev \
&& rm -rf /var/lib/apt/lists/*

# Install noVNC
Expand All @@ -65,6 +75,9 @@ RUN node -v && npm -v && npx -v
# Set up working directory
WORKDIR /app

# Add src directory to Python path for imports
ENV PYTHONPATH=/app/src:/app

# Copy requirements and install Python dependencies
COPY requirements.txt .

Expand All @@ -83,7 +96,7 @@ RUN mkdir -p $PLAYWRIGHT_BROWSERS_PATH
# RUN playwright install chrome --with-deps

# Alternative: Install Chromium if Google Chrome is problematic in certain environments
RUN playwright install chromium --with-deps
RUN playwright install chromium


# Copy the application code
Expand All @@ -96,4 +109,4 @@ COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
EXPOSE 7788 6080 5901 9222

CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
#CMD ["/bin/bash"]
#CMD ["/bin/bash"]
164 changes: 164 additions & 0 deletions LIVE_BROWSER_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# 🧪 Live Browser Testing Agent

This system now supports **real-time browser automation viewing** directly in your frontend! Instead of watching video recordings after the fact, you can see the browser automation happening live as it occurs.

## 🎯 What's New

### ✅ **Live Browser Viewing**
- **Real-time automation** visible in your frontend
- **Live mouse movements** and clicks
- **Page navigation** and form filling
- **Immediate feedback** when agent starts working
- **Final browser state** remains visible after completion

### ✅ **Two Interface Options**
1. **Simple HTML Frontend** (`static/index.html`) - Clean, focused interface
2. **Gradio WebUI** (`http://localhost:7788`) - Full-featured interface with live VNC

## 🚀 Quick Start

### Option 1: Automated Setup (Recommended)
```bash
# Run the startup script
python start_live_browser.py
```

### Option 2: Manual Setup
```bash
# Start Docker with VNC support
docker compose up --build

# Wait for services to start (about 30 seconds)
# Then access your interface
```

## 📱 Access Your Applications

| Service | URL | Purpose |
|---------|-----|---------|
| **Simple Frontend** | `http://localhost:7788` | Clean HTML interface |
| **Gradio WebUI** | `http://localhost:7788` | Full-featured interface |
| **VNC Viewer** | `http://localhost:6080/vnc.html` | Direct VNC access |
| **VNC Password** | `youvncpassword` | Default password |

## 🎨 How It Works

### **Before Agent Runs:**
- Clean browser window (empty or showing your app)
- Status indicator showing "Ready"

### **During Agent Execution:**
- **Real-time browser automation** happening right in the UI
- **Live mouse movements** and clicks
- **Page navigation** and form filling
- **Screenshot updates** as the agent works

### **After Agent Completes:**
- Final state of the browser
- Results visible in the browser
- Status showing "Completed"

## 🔧 Technical Details

### **VNC Architecture**
```
User Frontend → VNC Viewer (port 6080) → VNC Server (port 5901) → Virtual Display (:99) → Browser
```

### **Components**
- **Xvfb**: Virtual display server (`:99`)
- **x11vnc**: VNC server sharing the virtual display
- **noVNC**: Web-based VNC client
- **Supervisor**: Manages all services

### **Browser Configuration**
- **Headless**: `False` (browser visible for VNC)
- **Window Size**: 1280x1100
- **Display**: `:99` (virtual display)

## 🛠️ Troubleshooting

### **VNC Not Showing**
1. Check if Docker is running: `docker ps`
2. Verify VNC service: `docker logs <container_name>`
3. Check ports: `netstat -an | grep 6080`

### **Browser Not Visible**
1. Ensure `headless=False` in browser config
2. Check if virtual display is working
3. Verify VNC connection

### **Performance Issues**
1. Reduce VNC quality settings
2. Increase Docker memory allocation
3. Close unnecessary browser tabs

## 🔒 Security Notes

- **VNC Password**: Change default password in `.env` file
- **Network Access**: VNC is only accessible on localhost by default
- **Browser Isolation**: Each session runs in isolated container

## 📝 Configuration

### **Environment Variables**
```bash
# VNC Settings
VNC_PASSWORD=your_custom_password
RESOLUTION=1920x1080x24

# Browser Settings
DISPLAY=:99
USE_OWN_BROWSER=false
KEEP_BROWSER_OPEN=true
```

### **Custom VNC Settings**
Edit `supervisord.conf` to modify:
- VNC port (default: 5901)
- Display resolution
- Authentication settings

## 🎯 Usage Examples

### **Simple Test**
1. Open `http://localhost:7788`
2. Enter query: "Click the login button"
3. Enter URL: "https://example.com"
4. Click "Start Live Test"
5. Watch the browser automation happen live!

### **Complex Workflow**
1. Start with simple navigation
2. Watch form filling in real-time
3. See error handling and retries
4. Observe final state and results

## 🚀 Advanced Features

### **Multiple Browser Sessions**
- Each test runs in isolated browser context
- No interference between concurrent tests
- Clean state for each automation

### **Debugging Support**
- Live view helps identify automation issues
- Real-time feedback on agent decisions
- Visual confirmation of actions

### **Integration Options**
- Embed VNC viewer in any web application
- Customize VNC viewer appearance
- Add status indicators and controls

## 📞 Support

If you encounter issues:
1. Check the Docker logs: `docker compose logs`
2. Verify all services are running
3. Ensure ports are not blocked
4. Check browser console for errors

---

**🎉 Enjoy your live browser automation experience!**
40 changes: 40 additions & 0 deletions changes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
browser-use-agent-tab.py
-------------------------
- Removed all extra/unused variables.
- Rewrote run_agent_task() without using gradio, webui_manager, or other UI dependencies.
- Commented out all unused functions:
- pause_button
- resume_button
- _ask_assistant_callback
- handle_done
- handle_new_step
- _get_config_value
- _format_agent_output
- Created a FastAPI endpoint to run main-agent-task.
- Created a "static/" folder containing a simple UI for "Website Testing Agent".

browser_recorder.py
--------------------
- Cleans the entire video directory before starting a new recording session.
- Uses glob.glob() to recursively find .webm files after context closure.
- Does not use page.on() or page.video.start(); relies on Playwright's built-in recording mechanism.
- Stores just the filenames in a list: self.recorded_videos.

agent/mainagent.py
-------------------
- Modified loop logic to ensure the agent runs correctly only when required.

agent/qa_possibility_checker/
------------------------------
- Updated prompt.py with refined prompt structure.
- Added custom_validate() function in output.py for validating the agent output.

agent/intent_classifier/
------------------------------
- Added custom_validate() function in output.py for validating the agent output.

agent/prompt_enhancer/
------------------------------
- Added custom_validate() function in output.py for validating the agent output.


13 changes: 12 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ services:
- "6080:6080"
- "5901:5901"
- "9222:9222"
- "8000:8000"
environment:
# LLM API Keys & Endpoints
- OPENAI_ENDPOINT=${OPENAI_ENDPOINT:-https://api.openai.com/v1}
Expand Down Expand Up @@ -60,13 +61,23 @@ services:
- RESOLUTION=${RESOLUTION:-1920x1080x24}
- RESOLUTION_WIDTH=${RESOLUTION_WIDTH:-1920}
- RESOLUTION_HEIGHT=${RESOLUTION_HEIGHT:-1080}

- MONGO_URI=${MONGO_URI:-mongodb+srv://ahmadejaz:[email protected]/nextsqa_db}
- MONGODB_URI=${MONGODB_URI:-mongodb+srv://ahmadejaz:[email protected]/nextsqa_db}
- MONGO_DB_NAME=${MONGO_DB_NAME:-nextsqa_db}
- DB_NAME=${DB_NAME:-nextsqa_db}
# VNC Settings
- VNC_PASSWORD=${VNC_PASSWORD:-youvncpassword}

# Python Path Settings
- PYTHONPATH=/app/src

volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
# - ./my_chrome_data:/app/data/chrome_data # Optional: persist browser data
# Mount output directory for saving screenshots, videos, and agent data
- ./src/outputdata:/app/src/outputdata
# Mount the single root .env so load_dotenv can read it inside the container
- ./.env:/app/.env:ro
restart: unless-stopped
shm_size: '2gb'
cap_add:
Expand Down
18 changes: 18 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,21 @@ langchain-ibm==0.3.10
langchain_mcp_adapters==0.0.9
langgraph==0.3.34
langchain-community
langchain-ollama
# FastAPI and web framework
fastapi
uvicorn[standard] # Includes websockets library automatically
pydantic==2.10.6
fastapi-mail==1.4.1
# Database and authentication
motor==3.4.0
passlib[bcrypt]==1.7.4
python-jose==3.3.0
email-validator==2.1.1
pymongo==4.5.0
# Additional dependencies that might be needed
python-multipart
httpx
requests
pillow
python-dotenv
31 changes: 31 additions & 0 deletions src/API/Ai_Testing/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
from pydantic import BaseModel, Field
from datetime import datetime, timezone
from typing import Optional


class ActionsModel(BaseModel):
action: dict = Field(..., description="The action taken by the agent")


class StepsModel(BaseModel):
step: str = Field(..., description="The step taken by the agent")
step_no: int = Field(..., description="The step number in the sequence")
action: list[ActionsModel] = Field(..., description="The action taken in this step")
created_at: Optional[datetime] = Field(..., description="Timestamp of when the step was created")
updated_at: Optional[datetime] = Field(..., description="Timestamp of the last update")


class ResultModel(BaseModel):

final_result: str = Field(..., description="The final result of the agent's task")
steps: list[StepsModel] = Field(..., description="List of steps taken by the agent")



class AgentModel(BaseModel):
query: str = Field(..., description="The query to run the agent on")
url: str = Field(..., description="The URL to run the agent on")
result: ResultModel = Field(..., description="The result of the agent's task")
user_id: str = Field(..., description="The ID of the user who initiated the task")
created_at: datetime = Field(..., description="Timestamp of when the agent was created")
updated_at: datetime = Field(..., description="Timestamp of the last update")
10 changes: 10 additions & 0 deletions src/API/Ai_Testing/routes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from fastapi import APIRouter
from .schemas import AgentRequest
from .services import run_agent_work

router = APIRouter()

@router.post("/run-agent")
async def run_agent(request: AgentRequest):
return await run_agent_work(request.query, request.url, {"sub": "[email protected]"})

Loading