Your AI-Powered Academic Research Companion
A sophisticated research agent that interfaces with arXiv to search, download, and analyze academic papers. Built with LangChain, this tool leverages MCP (Model Context Protocol) servers to provide seamless access to academic research.
- π Intelligent Paper Search: Find academic papers by topic, title, or keywords
- π₯ Automated Downloads: Download papers directly from arXiv with a single command
- π Full Paper Analysis: Read and analyze complete paper content
- π€ AI-Powered Assistance: Leverages state-of-the-art LLMs for paper analysis
- π Rich CLI Interface: Beautiful terminal interface with color-coded messages
- π Interactive & Batch Modes: Support for both interactive and single-query modes
- π Local Paper Management: List and manage all locally downloaded papers
- β‘ Streamlined Workflow: Automated search β download β read β analyze pipeline
- Python 3.12 or higher
- An API key from Anthropic (for Claude models) or other supported providers
uvpackage manager (for MCP server management)
# Clone the repository
git clone https://github.com/devgomesai/Research-Agent-Arxiv.git
cd research-agent
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Or if using uv: uv pip install -r requirements.txtCreate a .env file in the root directory with your API key:
ANTHROPIC_API_KEY=your_api_key_here
Important: Before running the application, you must create a storage folder and reference its path in the --storage-path parameter.
-
Create a new folder that will store downloaded papers:
mkdir /path/to/your/storage/folder # Example: mkdir ./arxiv_storage -
Update the MCP server configuration in
src/mcp_server.pyto point to your storage folder:# In src/mcp_server.py, modify the storage path: "--storage-path", "/path/to/your/storage/folder/", # Update this path
This project relies on the following key packages:
langchain: Framework for developing LLM applicationslangchain-mcp-adapters: MCP server integrationlangchain-anthropic: Anthropic model integrationlanggraph: State management for agentsrich: Rich text and beautiful formatting in the terminalpython-dotenv: Environment variable managementarxiv-mcp-server: Backend server for arXiv access
All dependencies are listed in requirements.txt and pyproject.toml.
Run the agent in interactive mode to have an ongoing conversation:
python -m src.agentThe agent will start in interactive mode where you can continuously ask questions about academic papers.
Run a single query and exit:
python -m src.agent --query "Find papers on Vision Transformers"Specify a different language model (default is Claude 3.5 Sonnet):
python -m src.agent --model anthropic:claude-3-5-sonnet-latest --query "Summarize recent advances in transformer architectures"Other supported models include:
anthropic:claude-3-opus- And more depending on your configuration
Control how many papers are downloaded per query using the --k parameter:
# Download and analyze 5 papers on the topic
python -m src.agent --query "Find papers on Vision Transformers" --k 5# Verbose output for debugging
python -m src.agent --verbose
# Limit number of tools displayed
python -m src.agent --tools-limit 3
# Number of papers to download per query (default: 1, range: 1-20)
python -m src.agent --k 5
# Show help
python -m src.agent --helpβββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β User Input βββββΆβ Research βββββΆβ arXiv MCP β
β (Query + k) β β Agent β β Server β
β β β β β (Searches & β
βββββββββββββββββββ ββββββββββββββββ β Downloads) β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Storage Path β
β (Local Papers) β
βββββββββββββββββββ
The system follows this workflow:
- Search: User request triggers paper search on arXiv (number of papers to download specified by
kparameter) - Download: Relevant papers (up to
kcount) are downloaded to the configured storage path - Read: Paper content is retrieved from local storage
- Analyze: AI model analyzes and responds to the user's query
The research agent uses the Model Context Protocol (MCP) to interface with the arXiv server. The arxiv-mcp-server provides the following tools:
search_papers: Search arXiv for papers by topic, title, or keyworddownload_paper: Download a paper using its arXiv IDread_paper: Retrieve full content of a downloaded paperlist_papers: List all locally downloaded papers
The agent follows a strict 4-step workflow:
- User provides: topic, title, or keywords
- Agent calls:
search_paperswith user's query - Result: List of relevant papers with arXiv IDs
- Agent calls:
download_paperwith arXiv IDs from search results - Result: Papers saved to local storage at the configured path
- Agent calls:
read_paperto access full content of downloaded papers - Result: Full text (title, authors, abstract, body, references)
Based on user's request, the agent provides:
- Summary: 2-3 sentence overview with main contribution and key results
- Detailed Analysis: Executive Summary, Research Context, Methodology, Results, Implications, Future Directions
- Comparison: Analysis of multiple papers with comparison table/summary
- Code/Implementation: Pseudocode and implementation details from papers
The workflow is fully automated - the agent will autonomously complete all steps from search to analysis without requiring arXiv IDs from the user.
All downloaded papers are stored in the directory specified by the --storage-path parameter in the MCP server configuration. The default location is C:/arxiv_storage/, but you should update this to a folder that exists on your system.
The application will automatically create and manage the storage directory, organizing papers by their arXiv IDs for efficient retrieval.
To modify the agent's behavior, you can update the prompt in src/prompt.py. The PAPER_ANALYSIS_PROMPT defines the agent's instructions and workflow for paper analysis.
- API Key Issues: Ensure your API key is correctly set in the
.envfile - Storage Path Issues: Verify the storage folder exists and has proper read/write permissions
- Model Access: Confirm your account has access to the requested language model
- MCP Server: Ensure the arXiv MCP server can be accessed through
uvcommand
This project uses the excellent arxiv-mcp-server by blazickjp for arXiv integration via the Model Context Protocol.