Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM), capable of processing heterogeneous clinical inputs. The system generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning that links intermediate analytic steps to verifiable medical evidence.
DeepRare comprises three key components: a central host with a long-term memory module; specialized agent servers responsible for domain-specific analytical tasks integrating over 40 specialized tools and web-scale, up-to-date medical knowledge sources, ensuring access to the most current clinical information. This modular and scalable design enables complex diagnostic reasoning while maintaining traceability and adaptability.
We evaluate DeepRare on eight datasets. The system demonstrates exceptional diagnostic performance among 2,919 diseases. In HPO-based evaluations, DeepRare significantly outperforms other 15 methods, like traditional bioinformatics diagnostic tools, LLMs, and other agentic systems, achieving an average Recall@1 score of 57.18% and surpassing the second-best method (Reasoning LLM) by a substantial margin of 23.79 percentage points. For multi-modal input scenarios, DeepRare achieves 70.60% at Recall@1 compared to Exomiser's 53.20% in 109 cases. Manual verification of reasoning chains by clinical experts achieves 95.40% agreements. Furthermore, the DeepRare system has been implemented as a user-friendly web application http://raredx.cn/doctor.
For more detailed about our pipeline, please refer to our paper.
- RAM: Minimum 16GB (32GB recommended)
- Storage: 100GB+ free disk space (SSD preferred)
- GPU: Optional but recommended for faster model inference
- CPU: Any modern 64-bit processor
- OS: Any 64-bit operating system
- Java: Version 21 or above
- Python: 3.8+ (for model inference)
Note: GPU is optional - models can run on CPU with slower performance. Exomiser tool requires the specified minimum resources for optimal functionality.
The system supports multiple LLM providers. You need to obtain an API key from at least one of the following:
- How to obtain: Sign up at platform.openai.com
- Environment variable:
OPENAI_API_KEY
- How to obtain: Sign up at console.anthropic.com
- Environment variable:
ANTHROPIC_API_KEY
- How to obtain: Sign up at ai.google.dev
- Environment variable:
GOOGLE_API_KEY
- How to obtain: Sign up at platform.deepseek.com
- Environment variable:
DEEPSEEK_API_KEY
- Custom LLM Integration: Support for locally hosted or custom LLM endpoints
- Setup: Modify
api/interface.pyto adapt your custom LLM provider - Implementation:
- Extend the base LLM interface class in
api/interface.py - Configure endpoint URL and authentication if needed
- Extend the base LLM interface class in
-
Clone the repository and install dependencies:
git clone https://github.com/MAGIC-AI4Med/DeepRare.git cd DeepRare pip install -r requirements.txt -
Setup ChromeDriver:
Download ChromeDriver that matches your Chrome browser version:
- Visit ChromeDriver Downloads
- Download the version compatible with your Chrome browser
- Extract the downloaded file
Install ChromeDriver (Linux/Mac):
Open terminal, navigate to the directory containing chromedriver, and run:
sudo mv chromedriver /usr/local/bin/ sudo chmod +x /usr/local/bin/chromedriver
Verify installation:
chromedriver --version
For Windows:
# Place chromedriver.exe in your desired location, e.g.: C:\chromedriver\chromedriver.exe
Note: Make sure ChromeDriver version matches your installed Chrome browser version.
-
Install Exomizer (If required Gene Part):
Following Online document
Linux/Mac:
# download the distribution (won't take long) wget https://data.monarchinitiative.org/exomiser/latest/exomiser-cli-14.1.0-distribution.zip # download the data (this is ~20GB and will take a while) wget https://data.monarchinitiative.org/exomiser/latest/2410_hg19.zip wget https://data.monarchinitiative.org/exomiser/latest/2410_hg38.zip wget https://data.monarchinitiative.org/exomiser/latest/2410_phenotype.zip # unzip the distribution and data files - this will create a directory called 'exomiser-cli-14.1.0' in the current working directory unzip exomiser-cli-14.1.0-distribution.zip unzip 2410_*.zip -d exomiser-cli-14.1.0/data # Check the application.properties are pointing to the correct versions: # exomiser.hg19.data-version=2410 # exomiser.hg38.data-version=2410 # exomiser.phenotype.data-version=2410
Windows:
- Download pre-built binaries from Exomizer releases
- Extract and add to your PATH
Verify installation:
exomizer --version
Follow these steps to reproduce the results:
- Download database files from huggingface:
huggingface-cli download Angelakeke/DeepRare --repo-type=dataset --local-dir ./database
- Add your LLM API key to
inference.sh,inference_gene.sh,eval.sh. - Configure ChromeDriver path in
inference.shandinference_gene.sh. - Run the script:
# For HPO input bash inference.sh # For HPO+Gene input bash inference_gene.sh # For Free-text preprocess bash extract_hpo.sh # For Evaluation bash eval.sh
Due to complex environment setup and LLM API requirements, we strongly recommend using our pre-deployed web application DeepRare for easy access and testing.
For web engineering implementation, we package this workflow using FastAPI with DeepSeek-V3 locally deployed on 16 Ascend 910B cards serving as the central host to ensure system stability and data security. The system architecture employs a microservices design with Redis for session management and SQL databases for persistent data storage. More details can be found in our paper (Section 11.4).
@article{zhao2026agentic,
title={An agentic system for rare disease diagnosis with traceable reasoning},
author={Zhao, Weike and Wu, Chaoyi and Fan, Yanjie and Qiu, Pengcheng and Zhang, Xiaoman and Sun, Yuze and Zhou, Xiao and Zhang, Shuju and Peng, Yu and Wang, Yanfeng and others},
journal={Nature},
pages={1--10},
year={2026},
publisher={Nature Publishing Group UK London}
}We gratefully acknowledge the developers and contributors of publicly available rare disease datasets, foundational research works, bioinformatics tools, and large language models that have collectively enabled our research.


