π³ DOCKER-FIRST: Production-ready containerized ArXiv research capabilities for AI assistants
π¬ RESEARCH-FOCUSED: Complete academic workflow - search, download, analyze papers seamlessly
Why This Docker Implementation?:
- β Container Isolation: Secure, reproducible research environment
- β Volume Persistence: Papers survive container restarts
- β Production Grade: Multi-stage builds, optimized for performance
- β Cross-Platform: Works on any Docker-enabled system
- β MCP Compliant: Full Model Context Protocol 2024-11-05 support
Feature | Traditional MCP | This Docker Implementation |
---|---|---|
Deployment | Local Python install | Single docker run command |
Dependencies | Manual environment setup | All dependencies included |
Isolation | Host system dependencies | Complete container isolation |
Portability | Platform-specific setup | Works anywhere Docker runs |
Storage | Local filesystem only | Persistent volume mounting |
Scaling | Single instance | Easy multi-container deployment |
Security | Host system access | Sandboxed execution |
- Zero Setup Friction: No Python environment conflicts or dependency issues
- Reproducible Research: Same environment across different machines/platforms
- Storage Persistence: Downloaded papers persist outside container lifecycle
- Security Isolation: Research tools run in contained environment
- Production Ready: Battle-tested Docker deployment patterns
The ArXiv MCP Server provides a bridge between AI assistants and arXiv's research repository through the Model Context Protocol (MCP). It allows AI models to search for papers and access their content in a programmatic way.
π€ Contribute β’ π Report Bug β’ π³ Docker Registry β
- π Paper Search: Query arXiv papers with filters for date ranges and categories
- π Paper Access: Download and read paper content
- π Paper Listing: View all downloaded papers
- ποΈ Local Storage: Papers are saved locally for faster access
- π Prompts: A Set of Research Prompts
- π³ Docker Ready: Official Docker MCP Registry integration with volume mounting
# Pull and run the latest image
docker run -i --rm \
-v ./papers:/app/papers \
jasonleinart/arxiv-mcp-server:latest
# Clone this Docker-optimized repository
git clone https://github.com/jasonleinart/arxiv-mcp-server.git
cd arxiv-mcp-server
# Build the Docker image
docker build -t arxiv-mcp-server:local .
# Run your local build
docker run -i --rm \
-v ./papers:/app/papers \
arxiv-mcp-server:local
Configure Claude Code to use the Docker MCP server by adding this to your claude_desktop_config.json
:
{
"mcpServers": {
"arxiv-mcp-server-docker": {
"command": "docker",
"args": [
"run",
"--rm",
"-i",
"--name", "arxiv-mcp-server",
"-v", "/path/to/your/papers:/app/papers",
"jasonleinart/arxiv-mcp-server:latest"
],
"env": {
"ARXIV_STORAGE_PATH": "/app/papers"
}
}
}
}
Important: Replace /path/to/your/papers
with your desired local storage path.
# Mount source code for development
docker run -i --rm \
-v $(pwd):/app \
-v ./papers:/app/papers \
python:3.11-slim \
bash -c "cd /app && pip install -e . && python -m arxiv_mcp_server"
# Run with specific storage location
docker run -i --rm \
-v /your/research/papers:/app/papers \
-e ARXIV_STORAGE_PATH=/app/papers \
jasonleinart/arxiv-mcp-server:latest
# Run as background service
docker run -d \
--name arxiv-mcp-service \
-v ./papers:/app/papers \
--restart unless-stopped \
jasonleinart/arxiv-mcp-server:latest
- Base Image: Multi-stage build with
python:3.11-slim-bookworm
- Package Manager: UV for fast dependency resolution
- Build Optimization: Bytecode compilation enabled for performance
- Security: Non-root execution with minimal attack surface
- Size: Optimized layers for efficient image distribution
Critical Path: Papers MUST be mounted to /app/papers
inside container
# β
Correct - papers persist on host
docker run -v /host/papers:/app/papers jasonleinart/arxiv-mcp-server:latest
# β Wrong - papers lost when container stops
docker run jasonleinart/arxiv-mcp-server:latest
Variable | Default | Purpose |
---|---|---|
ARXIV_STORAGE_PATH |
/app/papers |
Container storage location |
PYTHONUNBUFFERED |
1 |
Real-time logging output |
version: '3.8'
services:
arxiv-mcp:
image: jasonleinart/arxiv-mcp-server:latest
volumes:
- ./research-papers:/app/papers
environment:
- ARXIV_STORAGE_PATH=/app/papers
restart: unless-stopped
stdin_open: true
tty: true
- x86_64: Intel/AMD processors
- ARM64: Apple Silicon (M1/M2/M3), AWS Graviton
- Linux: Ubuntu, Debian, CentOS, Alpine
- macOS: Docker Desktop integration
- Windows: WSL2 backend support
- Startup Time: < 2 seconds cold start
- Memory Usage: ~150MB baseline + paper storage
- Network: Efficient arXiv API usage with caching
- Storage: Papers stored as both PDF and optimized markdown
β Agent Validation Complete: Full tool functionality verified
- Search operations: β Successful arXiv queries
- Download pipeline: β PDFβMarkdown conversion working
- Volume persistence: β Papers survive container restarts
- MCP protocol: β Full 2024-11-05 compliance
- Claude Code integration: β Seamless AI assistant connectivity
The server provides four main tools designed to work together in research workflows:
π Purpose: Find relevant research papers by topic, author, or category
When to use: Starting research, finding recent papers, exploring a field
# Basic search
result = await call_tool("search_papers", {
"query": "transformer architecture"
})
# Advanced search with filters
result = await call_tool("search_papers", {
"query": "attention mechanism neural networks",
"max_results": 20,
"date_from": "2023-01-01",
"date_to": "2024-12-31",
"categories": ["cs.AI", "cs.LG", "cs.CL"]
})
# Search by author
result = await call_tool("search_papers", {
"query": "au:\"Vaswani, A\"",
"max_results": 10
})
π₯ Purpose: Download and convert papers to readable markdown format
When to use: After finding interesting papers, before reading full content
# Download a specific paper
result = await call_tool("download_paper", {
"paper_id": "1706.03762" # "Attention Is All You Need"
})
# Check download status
result = await call_tool("download_paper", {
"paper_id": "1706.03762",
"check_status": true
})
π Purpose: View your local paper library
When to use: Check what papers you have, avoid re-downloading, browse collection
# See all downloaded papers
result = await call_tool("list_papers", {})
π Purpose: Access full text content of downloaded papers
When to use: Deep analysis, quotation, detailed study of methodology/results
# Read full paper content
result = await call_tool("read_paper", {
"paper_id": "1706.03762"
})
Here's how the tools work together in real research scenarios:
# Step 1: Search for recent papers in the field
search_result = await call_tool("search_papers", {
"query": "large language model reasoning",
"max_results": 15,
"date_from": "2024-01-01",
"categories": ["cs.AI", "cs.CL"]
})
# Step 2: Download promising papers
await call_tool("download_paper", {"paper_id": "2401.12345"})
await call_tool("download_paper", {"paper_id": "2402.67890"})
# Step 3: List your collection to confirm downloads
library = await call_tool("list_papers", {})
# Step 4: Read papers for detailed analysis
paper_content = await call_tool("read_paper", {"paper_id": "2401.12345"})
# Find papers by specific researchers
result = await call_tool("search_papers", {
"query": "au:\"Anthropic\" OR au:\"OpenAI\"",
"max_results": 10,
"date_from": "2023-06-01"
})
# Download the most relevant papers
for paper in result['papers'][:3]:
await call_tool("download_paper", {"paper_id": paper['id']})
# Search multiple related topics
topics = [
"transformer interpretability",
"attention visualization",
"neural network explainability"
]
for topic in topics:
results = await call_tool("search_papers", {
"query": topic,
"max_results": 8,
"date_from": "2022-01-01"
})
# Download top papers from each topic
for paper in results['papers'][:2]:
await call_tool("download_paper", {"paper_id": paper['id']})
# Review your complete collection
library = await call_tool("list_papers", {})
The server offers specialized prompts to help analyze academic papers:
A comprehensive workflow for analyzing academic papers that only requires a paper ID:
result = await call_prompt("deep-paper-analysis", {
"paper_id": "2401.12345"
})
This prompt includes:
- Detailed instructions for using available tools (list_papers, download_paper, read_paper, search_papers)
- A systematic workflow for paper analysis
- Comprehensive analysis structure covering:
- Executive summary
- Research context
- Methodology analysis
- Results evaluation
- Practical and theoretical implications
- Future research directions
- Broader impacts
Configure through environment variables:
Variable | Purpose | Default |
---|---|---|
ARXIV_STORAGE_PATH |
Paper storage location | ~/.arxiv-mcp-server/papers |
Category | Description | Use Cases |
---|---|---|
cs.AI |
Artificial Intelligence | General AI research, reasoning, planning |
cs.LG |
Machine Learning | Neural networks, deep learning, training |
cs.CL |
Computation and Language | NLP, language models, text processing |
cs.CV |
Computer Vision | Image processing, visual recognition |
cs.RO |
Robotics | Autonomous systems, control theory |
stat.ML |
Machine Learning (Statistics) | Statistical learning theory, methods |
Topic searches: "transformer architecture"
, "reinforcement learning"
Author searches: "au:\"Hinton, Geoffrey\""
, "au:OpenAI OR au:Anthropic"
Title searches: "ti:\"Attention Is All You Need\""
, "ti:BERT OR ti:GPT"
Combined searches: "ti:transformer AND au:Vaswani"
, "abs:\"few-shot learning\" AND cat:cs.LG"
- Use explicit workflows: Guide your model through Search β Download β List β Read β Analyze
- Reference tool purposes: Mention why you're using each tool in your prompts
- Check library first: Always use
list_papers
before downloading to avoid duplicates - Be specific with parameters: Use the exact formats shown in tool examples
Run the test suite:
python -m pytest
- β Production deployment - Need reliable, consistent environments
- β Team collaboration - Multiple developers need identical setups
- β CI/CD integration - Automated testing and deployment pipelines
- β Security isolation - Research tools need sandboxed execution
- β Cross-platform - Supporting Windows, macOS, Linux users
- β Scaling requirements - Multiple instances or load balancing
- β Zero setup friction - Users want single-command deployment
- π§ Development workflow - Active code modification and debugging
- π§ Custom integrations - Need to modify source code extensively
- π§ Resource constraints - Minimal overhead requirements
- π§ Direct filesystem - Need native host filesystem access patterns
Already using traditional MCP? Easy migration:
# Traditional MCP
uv tool run arxiv-mcp-server
# Equivalent Docker command
docker run -i --rm -v ./papers:/app/papers jasonleinart/arxiv-mcp-server:latest
Your existing papers and workflows remain compatible!
Addressing Community Feedback: This Docker implementation specifically resolves issues with sparse tool descriptions that confuse local AI models.
Unlike minimal descriptions that cause local model confusion, each tool includes:
- Purpose Statement: Clear explanation of what the tool does
- Usage Context: When and why to use this tool
- Parameter Guidance: Detailed input specifications with examples
- Query Patterns: Built-in examples for search syntax and formatting
- Integration Flow: How tools work together in research workflows
- Docker MCP Gateway Ready: Seamless integration with local model deployments
- Llama/Mistral/Local Model Tested: Verified compatibility with popular local LLMs
- Context-Rich Responses: Tools provide detailed feedback to help models understand results
- Error Handling: Clear error messages that local models can interpret and act on
- Workflow Guidance: Tools suggest logical next steps in research processes
Before (Sparse): "search_papers": "Search arXiv papers"
After (Rich): "Search for academic research papers on arXiv.org using advanced filtering capabilities. This tool allows you to find papers by keywords, authors, categories, and date ranges. Use this when you need to discover relevant research papers on a specific topic, find papers by a particular author, or explore recent publications in a field..."
Impact: Local models now understand tool context and usage patterns, dramatically improving research workflow success rates.
This Docker implementation has been extensively tested:
- Agent Testing: Validated with Claude Code using real research workflows
- Multi-platform: Tested on macOS (Apple Silicon), Linux (x86_64)
- Volume Persistence: Papers verified to survive container restarts
- Performance: Sub-2-second startup, efficient memory usage
- MCP Compliance: Full protocol 2024-11-05 compatibility
Released under the Apache 2.0 License. See the LICENSE file for details.
This is a Docker-focused fork optimizing ArXiv MCP for containerized deployment.
- Original MCP Server: blazickjp/arxiv-mcp-server
- This Docker Implementation: Focus on production container deployment