This project implements a multi-agent system using AutoGen to collaboratively search and download PDF files related to specific topics.
The system consists of three main agents:
- Research Agent: Searches for PDF files related to a given topic and compiles a list of URLs.
- Download Agent: Receives URLs from the Research Agent and handles the downloading process.
- User Proxy Agent: Coordinates the interaction between agents and manages the overall workflow.
The Multi-Agent PDF Discovery System is an AI-powered collaborative system designed to automatically search and download research PDFs on specified topics. The system utilizes AutoGen's multi-agent framework to coordinate between specialized agents that handle different aspects of the PDF discovery and download process.
-
User Proxy Agent
- Acts as an intermediary between the user and other agents
- Coordinates the workflow between Research and Download agents
- Handles user input and system output
-
Research Agent
- Specializes in finding relevant PDF URLs
- Uses semantic search to identify appropriate academic papers
- Filters and ranks results based on relevance
- Returns structured URL data with paper titles
-
Download Agent
- Handles PDF file downloading
- Implements robust error handling
- Manages file naming and storage
- Reports download status and results
- Python Version: 3.8+
- Primary Framework: AutoGen v0.2.0
- Key Dependencies:
pyautogen
: Multi-agent orchestrationrequests
: HTTP handlingbeautifulsoup4
: Web scrapingpython-dotenv
: Environment variable managementhashlib
: Secure filename generation
# Agent Configuration
assistant_config = {
"seed": 42,
"temperature": 0,
"config_list": config_list,
"timeout": 120
}
# System Parameters
MAX_CONVERSATIONS = 5
DOWNLOAD_TIMEOUT = 120
-
API Key Management
- OpenAI API keys stored in
.env
file - Secure environment variable loading
- No hardcoded credentials
- OpenAI API keys stored in
-
File Security
- URL sanitization
- Secure filename generation
- Download validation checks
-
Error Handling
- Network timeout management
- Invalid URL detection
- Corrupt file checking
- Rate limiting compliance
-
Initialization
# Load environment variables load_dotenv() # Initialize agents user_proxy = autogen.UserProxyAgent(...) research_agent = autogen.AssistantAgent(...) download_agent = autogen.AssistantAgent(...)
-
Research Phase
- User provides research topic
- Research Agent searches for relevant PDFs
- Returns structured list of URLs and titles
-
Download Phase
- Download Agent processes each URL
- Implements retry logic for failed downloads
- Validates downloaded files
- Reports success/failure status
-
Error Management
try: response = requests.get(url, timeout=DOWNLOAD_TIMEOUT) # Download handling except requests.exceptions.RequestException as e: logging.error(f"Download failed: {str(e)}")
- CPU: Modern multi-core processor
- RAM: Minimum 4GB
- Storage: Sufficient for PDF storage
- Python 3.8 or higher
- pip package manager
- Virtual environment (recommended)
-
Environment Setup
python -m venv venv source venv/bin/activate # Unix/macOS pip install -r requirements.txt
-
Configuration
cp .env.example .env # Edit .env with your OpenAI API key
# Initialize the system
user_proxy.initiate_chat(
research_agent,
message="Find PDFs about artificial intelligence in healthcare"
)
The system implements comprehensive logging:
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
Error Code | Description | Resolution |
---|---|---|
E001 | API Key Missing | Check .env file |
E002 | Download Failed | Check network/retry |
E003 | Invalid URL | Verify URL format |
E004 | File Corruption | Retry download |
- Implements rate limiting for API calls
- Chunked downloading for large files
- Efficient memory management
- Parallel download capabilities
- Advanced PDF content filtering
- Multiple search engine support
- Machine learning-based relevance scoring
- Academic database integration
- Enhanced metadata extraction
- Fork the repository
- Create feature branch
- Submit pull request
- Follow coding standards
MIT License - See LICENSE file for details
- Install the required dependencies:
pip install -r requirements.txt
- Create a
.env
file in the project root with your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
- Run the script:
python pdf_finder_agents.py
pdf_finder_agents.py
: Main script containing the agent implementationsrequirements.txt
: List of Python dependenciespdf_downloads/
: Directory where downloaded PDFs are stored
- Collaborative multi-agent system using AutoGen
- Automated PDF discovery based on topics
- Coordinated downloading of found PDFs
- Error handling and download verification
- Clean separation of agent responsibilities