Visual Web Agent

It is a visual web automation agent that analyzes Medium articles using AI-powered summarization and provides personalized reading recommendations based on your interests.

Features

Visual Web Automation: Uses Playwright to navigate and capture screenshots of Medium articles
AI-Powered Analysis: Leverages Google's Gemini 2.5 Flash model for intelligent content summarization
Multi-Stage Processing: Implements LangGraph workflow for systematic article analysis
Popup Handling: Automatically detects and closes Medium login popups during scrolling
Personalized Recommendations: Provides YES/NO recommendations based on your specified interests
Clean Browser Management: Opens a fresh Chromium instance without affecting existing Chrome sessions

Architecture

The agent follows a multi-stage workflow:

Initialize - Launch Chromium browser and navigate to the article
Screenshot - Capture visual content of the current viewport
Summarize - Use Gemini Vision to analyze and summarize screenshot content
Scroll Decision - Intelligently decide whether to continue scrolling or aggregate results
Aggregate - Compile all summaries into final analysis and recommendation

Prerequisites

Python 3.8+
Google Gemini API key
Chrome/Chromium browser

Installation

Clone the repository

git clone https://github.com/PrudhviGudla/Visual-Web-Agent
cd Visual-Web-Agent

Install dependencies

pip install -r requirements.txt

Install Playwright browsers

playwright install chromium

Set up environment variables Create .env file

echo "GEMINI_API_KEY=your_gemini_api_key_here" > .env

Usage

Command Line Arguments

python vwa_medium.py --link "ARTICLE_URL" --interests interest1 interest2 --scroll_count 3

Parameters

--link: URL of the Medium article to analyze (required)
--interests: List of your interests for recommendation matching (default: AI Technology)
--scroll_count: Maximum number of scrolls through the article (default: 3)

Alternative JSON Configuration

python vwa_medium.py --config '{"link": "https://medium.com/@user/article", "interests": ["AI", "Python", "Technology"], "scroll_count": 5}'

Example

python vwa_medium.py
--link "https://medium.com/@itberrios6/introduction-to-point-net-d23f43aa87d2"
--interests "AI" "Machine Learning" "Computer Vision"
--scroll_count 4

Sample Output

--- Starting LangGraph Agent ---

--- Step: init ---
-----Initializing new Chromium browser instance-----
-----Chromium browser initialized-----
-----Navigating to the provided URL-----

--- Step: screenshot ---
*****ACTION: taking screenshot of the current browser state
--- Step: summarizer ---

Individual Screenshot Summary:
The screenshot shows a Medium article about Point Net, discussing 3D point cloud processing...

--- Step: aggregate ---

Final Aggregated Summary and Recommendation:
SUMMARY: This article provides an intuitive introduction to Point Net, a neural network architecture designed for processing 3D point clouds. It covers the fundamentals of point cloud data, traditional feature extraction challenges, and how Point Net addresses these through end-to-end learning with transformation networks (T-nets) for rotation invariance.

RECOMMENDATION: YES - This article is highly relevant for someone interested in AI, Machine Learning, and Computer Vision. It bridges traditional computer vision with modern deep learning approaches for 3D data processing, making it valuable for understanding emerging AI applications in spatial computing.

Customization Options

Scroll Amount: Modify the viewport_height * 0.8 multiplier in scroll_down() tool
Screenshot Quality: Adjust Playwright screenshot options in take_ss() tool
Popup Selectors: Update close_selectors in close_login_popup_if_present() for different sites
Model Selection: Change the Gemini model in the llm initialization

Performance Tips

Reduce scroll_count for faster analysis of shorter articles
Increase scroll_count for comprehensive analysis of longer content
Adjust viewport height multiplier for different scroll distances

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
vwa_medium.py		vwa_medium.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Visual Web Agent

Features

Architecture

Prerequisites

Installation

Usage

Command Line Arguments

Parameters

Alternative JSON Configuration

Example

Sample Output

Customization Options

Performance Tips

About

Uh oh!

Releases

Packages

Languages

PrudhviGudla/Visual-Web-Agent

Folders and files

Latest commit

History

Repository files navigation

Visual Web Agent

Features

Architecture

Prerequisites

Installation

Usage

Command Line Arguments

Parameters

Alternative JSON Configuration

Example

Sample Output

Customization Options

Performance Tips

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages