Browser Automation Agent

[Crustdata Build Challenge : AI Agent for Browser Automation]

This project is a browser automation agent that can understand natural language instructions and perform actions in a web browser, using Gemini LLMs and Playwright.

Architecture

The agent consists of three main components:

Planner: Breaks down a user's natural language query into high-level steps.
Executor: Performs each step by interacting with the browser through Playwright.
Verifier: Verifies if each step was completed successfully.

Setup

Clone the repository
Install the requirements:
```
pip install -r requirements.txt
```
Install Playwright browsers:
```
playwright install chromium
```
Create a .env file with your Gemini API key:
```
GEMINI_API_KEY=your_api_key_here
```

Usage

Run the main script:

python interact.py

Enter your natural language query when prompted. For example:

login to reddit.com, search ghibli, save the first three posts with metadata like date, author etc.

The agent will:

Plan the steps needed to achieve your goal
Execute each step in the browser
Verify each step was completed successfully
Report progress and results

Configuration

Edit config.py to modify the agent's behavior:

Change model names
Enable/disable headless mode
Adjust retry limits
Enable/disable human-in-the-loop mode

Files

interact.py: Entry point script
models.py: Data models
planner.py: Step planning module
executor.py: Browser interaction module
verifier.py: Step verification module
utils.py: Utility functions
config.py: Configuration settings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Browser Automation Agent

[Crustdata Build Challenge : AI Agent for Browser Automation]

Architecture

Setup

Usage

Configuration

Files

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
design.jpg		design.jpg
dom_utils.py		dom_utils.py
executor.py		executor.py
interact.py		interact.py
models.py		models.py
planner.py		planner.py
requirements.txt		requirements.txt
utils.py		utils.py
verifier.py		verifier.py

Sparshsing/browser_automation_agent

Folders and files

Latest commit

History

Repository files navigation

Browser Automation Agent

[Crustdata Build Challenge : AI Agent for Browser Automation]

Architecture

Setup

Usage

Configuration

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages