Skip to content

devgomesai/Research-Agent-Arxiv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Research Agent [arXiv]

arXiv Logo

Your AI-Powered Academic Research Companion

Python LangChain arXiv

A sophisticated research agent that interfaces with arXiv to search, download, and analyze academic papers. Built with LangChain, this tool leverages MCP (Model Context Protocol) servers to provide seamless access to academic research.

Features

  • πŸ” Intelligent Paper Search: Find academic papers by topic, title, or keywords
  • πŸ“₯ Automated Downloads: Download papers directly from arXiv with a single command
  • πŸ“„ Full Paper Analysis: Read and analyze complete paper content
  • πŸ€– AI-Powered Assistance: Leverages state-of-the-art LLMs for paper analysis
  • πŸ“Š Rich CLI Interface: Beautiful terminal interface with color-coded messages
  • πŸ”„ Interactive & Batch Modes: Support for both interactive and single-query modes
  • πŸ“š Local Paper Management: List and manage all locally downloaded papers
  • ⚑ Streamlined Workflow: Automated search β†’ download β†’ read β†’ analyze pipeline

Prerequisites

  • Python 3.12 or higher
  • An API key from Anthropic (for Claude models) or other supported providers
  • uv package manager (for MCP server management)

Setup

1. Clone and Install Dependencies

# Clone the repository
git clone https://github.com/devgomesai/Research-Agent-Arxiv.git
cd research-agent

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
# Or if using uv: uv pip install -r requirements.txt

2. Create Your Configuration File

Create a .env file in the root directory with your API key:

ANTHROPIC_API_KEY=your_api_key_here

3. Storage Path Configuration (Required)

Important: Before running the application, you must create a storage folder and reference its path in the --storage-path parameter.

  1. Create a new folder that will store downloaded papers:

    mkdir /path/to/your/storage/folder
    # Example: mkdir ./arxiv_storage
  2. Update the MCP server configuration in src/mcp_server.py to point to your storage folder:

    # In src/mcp_server.py, modify the storage path:
    "--storage-path",
    "/path/to/your/storage/folder/",  # Update this path

Dependencies

This project relies on the following key packages:

  • langchain: Framework for developing LLM applications
  • langchain-mcp-adapters: MCP server integration
  • langchain-anthropic: Anthropic model integration
  • langgraph: State management for agents
  • rich: Rich text and beautiful formatting in the terminal
  • python-dotenv: Environment variable management
  • arxiv-mcp-server: Backend server for arXiv access

All dependencies are listed in requirements.txt and pyproject.toml.

Usage

Interactive Mode

Run the agent in interactive mode to have an ongoing conversation:

python -m src.agent

The agent will start in interactive mode where you can continuously ask questions about academic papers.

Single Query Mode

Run a single query and exit:

python -m src.agent --query "Find papers on Vision Transformers"

Using Different Models

Specify a different language model (default is Claude 3.5 Sonnet):

python -m src.agent --model anthropic:claude-3-5-sonnet-latest --query "Summarize recent advances in transformer architectures"

Other supported models include:

  • anthropic:claude-3-opus
  • And more depending on your configuration

Paper Download Control

Control how many papers are downloaded per query using the --k parameter:

# Download and analyze 5 papers on the topic
python -m src.agent --query "Find papers on Vision Transformers" --k 5

Additional Options

# Verbose output for debugging
python -m src.agent --verbose

# Limit number of tools displayed
python -m src.agent --tools-limit 3

# Number of papers to download per query (default: 1, range: 1-20)
python -m src.agent --k 5

# Show help
python -m src.agent --help

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Input    │───▢│ Research     │───▢│   arXiv MCP     β”‚
β”‚ (Query + k)     β”‚    β”‚   Agent      β”‚    β”‚   Server        β”‚
β”‚                 β”‚    β”‚              β”‚    β”‚ (Searches &     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  Downloads)     β”‚
                                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                    β”‚
                                                    β–Ό
                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                          β”‚  Storage Path   β”‚
                                          β”‚  (Local Papers) β”‚
                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The system follows this workflow:

  1. Search: User request triggers paper search on arXiv (number of papers to download specified by k parameter)
  2. Download: Relevant papers (up to k count) are downloaded to the configured storage path
  3. Read: Paper content is retrieved from local storage
  4. Analyze: AI model analyzes and responds to the user's query

How It Works

The research agent uses the Model Context Protocol (MCP) to interface with the arXiv server. The arxiv-mcp-server provides the following tools:

  • search_papers: Search arXiv for papers by topic, title, or keyword
  • download_paper: Download a paper using its arXiv ID
  • read_paper: Retrieve full content of a downloaded paper
  • list_papers: List all locally downloaded papers

The agent follows a strict 4-step workflow:

Step 1: Search for Papers

  • User provides: topic, title, or keywords
  • Agent calls: search_papers with user's query
  • Result: List of relevant papers with arXiv IDs

Step 2: Download Selected Papers

  • Agent calls: download_paper with arXiv IDs from search results
  • Result: Papers saved to local storage at the configured path

Step 3: Read Paper Content

  • Agent calls: read_paper to access full content of downloaded papers
  • Result: Full text (title, authors, abstract, body, references)

Step 4: Perform Analysis

Based on user's request, the agent provides:

  • Summary: 2-3 sentence overview with main contribution and key results
  • Detailed Analysis: Executive Summary, Research Context, Methodology, Results, Implications, Future Directions
  • Comparison: Analysis of multiple papers with comparison table/summary
  • Code/Implementation: Pseudocode and implementation details from papers

The workflow is fully automated - the agent will autonomously complete all steps from search to analysis without requiring arXiv IDs from the user.

Storage Management

All downloaded papers are stored in the directory specified by the --storage-path parameter in the MCP server configuration. The default location is C:/arxiv_storage/, but you should update this to a folder that exists on your system.

The application will automatically create and manage the storage directory, organizing papers by their arXiv IDs for efficient retrieval.

Development

To modify the agent's behavior, you can update the prompt in src/prompt.py. The PAPER_ANALYSIS_PROMPT defines the agent's instructions and workflow for paper analysis.

Troubleshooting

  • API Key Issues: Ensure your API key is correctly set in the .env file
  • Storage Path Issues: Verify the storage folder exists and has proper read/write permissions
  • Model Access: Confirm your account has access to the requested language model
  • MCP Server: Ensure the arXiv MCP server can be accessed through uv command

Credits

This project uses the excellent arxiv-mcp-server by blazickjp for arXiv integration via the Model Context Protocol.

About

Your AI-Powered Academic Research Companion

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages