WebAgent

WebAgent is an automated web browsing tool powered by AI that can navigate websites, understand page content, and complete tasks based on user instructions.

Features

AI-Powered Visual Understanding: Uses Google's Gemini API to analyze screenshots and understand webpage content
Multiple Operation Modes: Choose between HTML-only mode (faster) or screenshot+HTML modes (more accurate)
Batch Processing: Capture multiple screenshots before analysis for better context awareness
Automated Navigation: Intelligently clicks, types, selects, and navigates through websites
Cookie Consent Handling: Automatically handles cookie consent popups
Scrollable Content Analysis: Captures content by scrolling through long pages

Requirements

Python 3.7+
Chrome/Chromium browser
ChromeDriver (compatible with your browser version)
Google Gemini API key

Dependencies

selenium
beautifulsoup4
google-genai

Installation

Clone this repository
Install dependencies: pip install -r requirements.txt
Make sure ChromeDriver is installed and in your PATH
Set your Google Gemini API key in the GEMINI_API_KEY variable in the code (By creating .env file where writes "GEMINI_API_KEY=YOUR_KEY")

Usage

Run the script with:

python WebAgent.py

Then follow the interactive prompts to enter your task instructions.

Command Options

-help: Display help information
-url [new_url]: Change the target URL
-set mode [option]: Set operation mode
- html: HTML-only mode (no screenshots)
- batch: Batch mode - capture all screenshots before processing (default)
- standard: Process each screenshot immediately
-set scrolls [number]: Set maximum number of page scrolls
quit: Exit the program

Examples

-url https://www.imdb.com
-set mode html
-set scrolls 5
Tell me top3 on IMDb this week

Operation Modes

HTML-Only Mode

Uses only HTML content for analysis without capturing screenshots. Much faster but potentially less accurate for visually complex pages.

Batch Mode (Default)

Captures multiple screenshots while scrolling through the page, then analyzes them together for better context understanding.

Standard Mode

Processes each screenshot immediately after capture. Useful for very long pages where batch processing might hit token limits.

How It Works

Opens the specified URL in a Chrome browser
Handles any cookie consent dialogs
Captures content through screenshots and/or HTML parsing
Analyzes content using Google's Gemini AI models
Determines the best actions to fulfill the user's goal
Executes actions on the page
Repeats the process until an answer is found or maximum steps reached

Notes

Screenshots are temporarily stored in the tmp/ folder and cleaned up after task completion
The Gemini API key should be kept secure and not committed to public repositories

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
WebAgent.py		WebAgent.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebAgent

Features

Requirements

Dependencies

Installation

Usage

Command Options

Examples

Operation Modes

HTML-Only Mode

Batch Mode (Default)

Standard Mode

How It Works

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

InvokerMing/WebAgent

Folders and files

Latest commit

History

Repository files navigation

WebAgent

Features

Requirements

Dependencies

Installation

Usage

Command Options

Examples

Operation Modes

HTML-Only Mode

Batch Mode (Default)

Standard Mode

How It Works

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages