|
| 1 | +--- |
| 2 | +title: "MCP for Research: How to Connect AI to Research Tools" |
| 3 | +thumbnail: /blog/assets/mcp-for-research/thumbnail.png |
| 4 | +authors: |
| 5 | +- user: dylanebert |
| 6 | +--- |
| 7 | + |
| 8 | +# MCP for Research: How to Connect AI to Research Tools |
| 9 | + |
| 10 | +<script |
| 11 | + type="module" |
| 12 | + src="https://gradio.s3-us-west-2.amazonaws.com/4.36.1/gradio.js" |
| 13 | +></script> |
| 14 | + |
| 15 | +<gradio-app theme_mode="light" space="dylanebert/research-tracker-mcp"></gradio-app> |
| 16 | + |
| 17 | +Academic research involves frequent **research discovery**: finding papers, code, related models and datasets. This typically means switching between platforms like arXiv, GitHub, and Hugging Face, manually piecing together connections. |
| 18 | + |
| 19 | +The [Model Context Protocol (MCP)](https://huggingface.co/learn/mcp-course/unit0/introduction) is a standard that allows agentic models to communicate with external tools and data sources. For research discovery, this means AI can use research tools through natural language requests, automating platform switching and cross-referencing. |
| 20 | + |
| 21 | +## Research Discovery: Three Layers of Abstraction |
| 22 | + |
| 23 | +Much like software development, research discovery can be framed in terms of layers of abstraction. |
| 24 | + |
| 25 | +### 1. Manual Research |
| 26 | + |
| 27 | +At the lowest level of abstraction, researchers search manually and cross-reference by hand. |
| 28 | + |
| 29 | +```bash |
| 30 | +# Typical workflow: |
| 31 | +1. Find paper on arXiv |
| 32 | +2. Search GitHub for implementations |
| 33 | +3. Check Hugging Face for models/datasets |
| 34 | +4. Cross-reference authors and citations |
| 35 | +5. Organize findings manually |
| 36 | +``` |
| 37 | + |
| 38 | +### 2. Scripted Tools |
| 39 | + |
| 40 | +Python scripts automate research discovery by handling web requests, parsing responses, and organizing results. |
| 41 | + |
| 42 | +```python |
| 43 | +# research_tracker.py |
| 44 | +def gather_research_info(paper_url): |
| 45 | + paper_data = scrape_arxiv(paper_url) |
| 46 | + github_repos = search_github(paper_data['title']) |
| 47 | + hf_models = search_huggingface(paper_data['authors']) |
| 48 | + return consolidate_results(paper_data, github_repos, hf_models) |
| 49 | + |
| 50 | +# Run for each paper you want to investigate |
| 51 | +results = gather_research_info("https://arxiv.org/abs/2103.00020") |
| 52 | +``` |
| 53 | + |
| 54 | +The [research tracker](https://huggingface.co/spaces/dylanebert/research-tracker) demonstrates systematic research discovery built from these types of scripts. |
| 55 | + |
| 56 | +While scripts are faster than manual research, they are error-prone without human guidance. |
| 57 | + |
| 58 | +### 3. MCP Integration |
| 59 | + |
| 60 | +MCP makes these same Python tools accessible to AI systems through natural language. |
| 61 | + |
| 62 | +```markdown |
| 63 | +# Example research directive |
| 64 | +Find recent transformer architecture papers published in the last 6 months: |
| 65 | +- Must have available implementation code |
| 66 | +- Focus on papers with pretrained models |
| 67 | +- Include performance benchmarks when available |
| 68 | +``` |
| 69 | + |
| 70 | +The AI orchestrates multiple tools, fills information gaps, and reasons about results: |
| 71 | + |
| 72 | +```python |
| 73 | +# AI workflow: |
| 74 | +# 1. Use research tracker tools |
| 75 | +# 2. Search for missing information |
| 76 | +# 3. Cross-reference with other MCP servers |
| 77 | +# 4. Evaluate relevance to research goals |
| 78 | + |
| 79 | +user: "Find papers related to vision transformers with available code and models" |
| 80 | +ai: # Combines multiple tools to gather complete information |
| 81 | +``` |
| 82 | + |
| 83 | +This can be viewed as an additional layer of abstraction above scripting, where the "programming language" is natural language. This follows the [Software 3.0 Analogy](https://youtu.be/LCEmiRjPEtQ?si=J7elM86eW9XCkMFj), where the natural language research direction is the software implementation. |
| 84 | + |
| 85 | +This comes with the same caveats as scripting: |
| 86 | + |
| 87 | +- Faster than manual research, but error-prone without human guidance |
| 88 | +- Quality depends on the implementation |
| 89 | +- Understanding the lower layers (both manual and scripted) leads to better implementations |
| 90 | + |
| 91 | +## Setup and Usage |
| 92 | + |
| 93 | +### Quick Setup |
| 94 | + |
| 95 | +For setup instructions, see the [Research Tracker MCP](https://huggingface.co/spaces/dylanebert/research-tracker-mcp) space, then click `Use via MCP or API` at the bottom of the page. This will provide instructions for adding the server with SSE, i.e.: |
| 96 | + |
| 97 | +**MCP config** |
| 98 | +```json |
| 99 | +{ |
| 100 | + "mcpServers": { |
| 101 | + "gradio": { |
| 102 | + "url": "https://dylanebert-research-tracker-mcp.hf.space/gradio_api/mcp/sse" |
| 103 | + } |
| 104 | + } |
| 105 | +} |
| 106 | +``` |
| 107 | + |
| 108 | +## Learn More |
| 109 | + |
| 110 | +**Get Started:** |
| 111 | +- [Hugging Face MCP Course](https://huggingface.co/learn/mcp-course/en/unit1/introduction) - Complete guide from basics to building your own tools |
| 112 | +- [MCP Official Documentation](https://modelcontextprotocol.io) - Protocol specifications and architecture |
| 113 | + |
| 114 | +**Build Your Own:** |
| 115 | +- [Gradio MCP Guide](https://www.gradio.app/guides/building-mcp-server-with-gradio) - Turn Python functions into MCP tools |
| 116 | +- [Building the Hugging Face MCP Server](https://huggingface.co/blog/building-hf-mcp) - Production implementation case study |
| 117 | + |
| 118 | +**Community:** |
| 119 | +- [Hugging Face Discord](https://hf.co/join/discord) - MCP development discussions |
0 commit comments