INDIGENA:

Dependencies

Groovy 4.0.26
Java 8+

Python KGE Methods

Python 3.10
mowl
pykeen
torch
wandb
pandas
tqdm
click

Installation

git clone https://github.com/bio-ontology-research-group/indigena.git
cd indigena/
conda env create -f environment.yml
conda activate indigena

Usage

1. Uncompress UPheno file

First, extract the data archive:

cd data
gunzip upheno.owl.gz

2. Semantic Similarity Baselines (Groovy)

These baseline methods compute ontology-based semantic similarity between genes and diseases using the SLIB library.

Run with default parameters:

groovy semantic_similarity.groovy -r data -fold 0

Run with custom parameters:

groovy semantic_similarity.groovy -r data -ic resnik -pw resnik -gw bma -fold 0

Run SimGIC variant:

groovy semantic_similarity_simgic.groovy -r data -ic resnik -fold 0

Parameters:

-r, --root_dir: Data directory (default: data)
-ic, --ic_measure: Information content measure (resnik, sanchez)
-pw, --pairwise_measure: Pairwise measure (resnik, lin)
-gw, --groupwise_measure: Groupwise measure (bma, bmm)
-fold: Cross-validation fold number (default: 0)

Output: Results saved to data/baseline_results/

Evaluate results:

python evaluate_sem_sim.py data/baseline_results/<results_file>

3. Knowledge Graph Embeddings (Python)

This approach uses mOWL to project ontology into triples and PyKEEN to train KGE models. We use W&B to track experiments. Therefore, before running kge.py, change the entity name in wandb.init to you W&B username.

Run basic KGE model:

python kge_transe.py --fold 0 --mode inductive --graph2 --no_sweep

Run with hyperparameters:

python kge_transd.py --fold 0 --mode inductive \
  --embedding_dim 100 --batch_size 128 --learning_rate 0.001 \
  --graph2 --no_sweep

Parameters:

You can look at the hyperparameters in each script. They usually look like this:

--fold: Cross-validation fold number
--model_name: KGE model (transe, transd, convkb)
--mode: Evaluation mode (inductive, transductive)
--graph2: Add gene-phenotype edges
--graph3: Add disease-phenotype edges
--graph4: Add gene-disease training edges
--embedding_dim: Embedding dimensions
--batch_size: Training batch size
--learning_rate: Learning rate
--only_test: Only test existing model (skip training)
--description: Weights & Biases run description
--no_sweep: Disable W&B sweep mode
--pretrained_model: Features to initialize ConvKB embeddings (transe, transd).

Problems running the models?

Please create a Github issue or PR!

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
data		data
fig		fig
sweeps		sweeps
.gitignore		.gitignore
README.md		README.md
aggregated_sem_sim_metrics.py		aggregated_sem_sim_metrics.py
data.py		data.py
environment.yml		environment.yml
evaluate_sem_sim.py		evaluate_sem_sim.py
evaluation.py		evaluation.py
extract_metrics_from_sweep.py		extract_metrics_from_sweep.py
generate_inductive_dataset.py		generate_inductive_dataset.py
kge.py		kge.py
kge_convkb.py		kge_convkb.py
kge_transd.py		kge_transd.py
kge_transe.py		kge_transe.py
kge_transh.py		kge_transh.py
p_value.py		p_value.py
p_value.r		p_value.r
plot_umap.py		plot_umap.py
plot_umap.sh		plot_umap.sh
plot_umap_models_grid.py		plot_umap_models_grid.py
pykeen_utils.py		pykeen_utils.py
requirements.txt		requirements.txt
run_by_graph.sh		run_by_graph.sh
run_sweep.sh		run_sweep.sh
semantic_similarity.groovy		semantic_similarity.groovy
semantic_similarity_folds.sh		semantic_similarity_folds.sh
semantic_similarity_simgic.groovy		semantic_similarity_simgic.groovy
semantic_similarity_simgic_folds.sh		semantic_similarity_simgic_folds.sh
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INDIGENA:

Dependencies

Python KGE Methods

Installation

Usage

1. Uncompress UPheno file

2. Semantic Similarity Baselines (Groovy)

Run with default parameters:

Run with custom parameters:

Run SimGIC variant:

Parameters:

Evaluate results:

3. Knowledge Graph Embeddings (Python)

Run basic KGE model:

Run with hyperparameters:

Parameters:

Problems running the models?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

INDIGENA:

Dependencies

Python KGE Methods

Installation

Usage

1. Uncompress UPheno file

2. Semantic Similarity Baselines (Groovy)

Run with default parameters:

Run with custom parameters:

Run SimGIC variant:

Parameters:

Evaluate results:

3. Knowledge Graph Embeddings (Python)

Run basic KGE model:

Run with hyperparameters:

Parameters:

Problems running the models?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages