ELDER is an algorithm that uses text embeddings for differential diagnosis. It takes phenotype terms as input and queries a vector database of diseases to find the most similar diseases.
ELDER provides several strategies for analyzing phenotype terms:
- Average Embedding: Calculates the average embedding for all phenotype terms and queries for similar diseases
- Weighted Average Embedding: Calculates a weighted average embedding based on phenotype frequencies
- Best Match (Term-set Pairwise Comparison): Finds the best matches between phenotype terms and diseases
# Clone the repository
git clone https://github.com/monarch-initiative/ELDER.git
cd ELDER
# Install dependencies using uv
uv syncELDER requires the following data:
- Phenopackets: JSON files containing phenotype terms
- ChromaDB Collections: Vector database collections with embeddings
- exomiser-results: TSV files containing the exomiser results when run in phenotype only. Only needed for Pheval Benchmark.
The paths to these data sources can be specified in the configuration file.
Pre-built embeddings are available on HuggingFace for the following collections. Examples for ada-002 and small3 are given below:
# Download ADA-002 embeddings
embeddings download -p /path/to/chromadb/ --repo-id iQuxLE/ada-002_lrd_hpo_embedding --collection ada_hpo_embeddings --embeddings-filename embeddings.parquet --metadata-filename metadata.yaml
# Download small3 embeddings
embeddings download -p /path/to/chromadb/ --repo-id iQuxLE/small3_lrd_hpo_embedding --collection small3_hpo_embeddings --embeddings-filename embeddings.parquet --metadata-filename metadata.yamlELDER uses a YAML configuration file for settings. By default, it looks for elder_config.yaml in the current directory.
Example configuration:
# Database settings
database:
chroma_db_path: "/path/to/chromadb/"
collection_name: "lrd_hpo_embeddings"
similarity_measure: "COSINE"
# Runner settings
runner:
runner_type: "avg" # avg, wgt_avg, tpc
model_type: "large" # small, large, ada, mxbai, bge, nomic, custom
nr_of_phenopackets: "5084"
nr_of_results: 10
# Data paths
data_paths:
phenopackets_dir: "5084_phenopackets"You can create a configuration file in several ways:
-
Use a template:
# Generate a config file from a template elder generate-config templates/elder_config_template.yaml --output .
-
Copy an example:
# Copy an example configuration for average embedding strategy cp examples/configs/average_config.yaml elder_config.yaml -
Create from scratch: Create a new file named
elder_config.yamland add the necessary settings.
ELDER provides a command-line interface (CLI) for running analysis:
# Run with average strategy
elder average --model large --phenopackets 5084 --results 10 --collection lrd_hpo_embeddings --db-path /path/to/chromadb
# Run with weighted average strategy
elder weighted --model ada --phenopackets 5084 --results 10 --collection ada002_lrd_hpo_embeddings --db-path /path/to/chromadb
# Run with best match strategy
elder bestmatch --model mxbai --phenopackets 5084 --results 10 --collection mxbai_lrd_hpo_embeddings --db-path /path/to/chromadbIf you have a configuration file, you can use it like this:
# Use the default configuration file elder_config.yaml
elder average
# Override configuration values
elder --config path/to/custom_elder_config.yaml average --model ada --results 20Available commands:
average: Run analysis with average embedding strategyweighted: Run analysis with weighted average embedding strategybestmatch: Run analysis with best match (term-set pairwise comparison) strategygenerate-config: Generate a configuration file from a templatecurate-index: Index ontologies, search, and manage collections using CurateGPT integration
Run elder --help for more information about the available commands and options.
You can also use ELDER programmatically in your Python code:
from pheval_elder.dis_avg_emb_runner import DiseaseAvgEmbRunner
from pheval_elder.prepare.config.unified_config import RunnerType
# Create and run a runner with configuration
runner = DiseaseAvgEmbRunner.from_config(
config_path="elder_config.yaml",
config_overrides={
"runner_type": RunnerType.AVERAGE.value,
}
)
# Run analysis
runner.prepare()
runner.run()The examples directory contains sample scripts for running ELDER:
# Run with a specific configuration
python examples/run_with_config.py --config examples/configs/average_config.yaml --strategy avgEDIT: This is currently setup to work only with the CBORG AI Portal
If you prefer to create your own embeddings rather than using pre-built ones, you can use the curate-index command which provides direct integration with CurateGPT:
# Index HP ontology with standard descriptions
curate-index index-ontology --db-path ./my_db --collection hp_standard
# Index with enhanced descriptions (requires OpenAI API key)
curate-index index-ontology --enhanced-descriptions --collection hp_enhanced
# Include aliases in the indexed fields
curate-index index-ontology --index-fields "label,definition,relationships,aliases"
# Use different model shorthand names
curate-index index-ontology --model ada002 # OpenAI's text-embedding-ada-002
curate-index index-ontology --model large3 # OpenAI's text-embedding-3-large
curate-index index-ontology --model bge-m3 # BAAI/bge-m3
# Search in indexed collections
curate-index search "cardiac arrhythmia"
# View collection information
curate-index infoThe Human Phenotype Ontology (HP) works best with ELDER's existing analysis tools, as that's what the phenopackets reference.
For the best quality embeddings, especially for rare phenotypes, the enhanced descriptions option uses OpenAI's o1 model to generate:
- Detailed clinical information about etiology and associated conditions
- Anatomical structures and physiological processes involved
- Presentation across different severities and contexts
- Distinguishing features from similar phenotypes
This results in embeddings that better capture the semantic meaning and clinical context of each term.
For development, you can use the example scripts and templates provided:
# Run with a specific configuration
python examples/run_with_config.py --config examples/configs/average_config.yaml --strategy avg