Skip to content

Clarify embedding generation in README and codebase #29

@julesjacobsen

Description

@julesjacobsen

The curate-index command is not part of ELDER - clarify where this / these external dependencies are hosted:

https://github.com/monarch-initiative/ELDER?tab=readme-ov-file#creating-your-own-embeddings

In the paper you state that the ontology term relationships are used and produce the best results, yet its not clear how you generated these from the codebase. It could be using the curate-index commands (looks like it), but then there is this file https://github.com/monarch-initiative/ELDER/blob/7bb21744f1eaec37f82a314411b24711a514c4a0/src/pheval_elder/prepare/core/llm_descriptions/hpo_def_generator.py which appears to use an LLM to generate descriptions for an ontology term based on the id and label.

Is this still relevant to the project as described in the paper? If not, it should be deleted / moved into a different repo.

Lastly what were the exact commands and data versions used for each dataset you generated on Huggingface? This should be stated ideally on Hugging face as part of the data card e.g.

iQuxLE/ada-002_lrd_hpo_embedding
---
HPO version: v2025-08-11
Software: curate-index
Software version: ????
Software source: ???
Command: curate-index index-ontology --index-fields "label,definition,relationships,aliases" --model ada002

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions