This repository contains the implementation associated with the paper Klironomos A., Zhou B., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E. ReaLitE: Enrichment of Relation Embeddings in Knowledge Graphs using Numeric Literals
accepted in ESWC 2025. The paper presents a novel Knowledge Graph Embedding (KGE) model, namely ReaLitE, that demonstrates improved performance compared to the state of the art both in link prediction and node classification tasks.
Our contributions are summarized below:
- We propose
ReaLitE, an approach that can be combined with any vanilla KGE method with relation embeddings. In addition, we demonstrate the integration of our method into existing KGE frameworks, highlighting its versatility and ease of adoption. - We experiment with different methods of aggregating numeric literals, including an automated method to learn a combination of multiple aggregation types.
- We evaluate
ReaLitEextensively and compare it with state-of-the-art in two tasks: link prediction and node classification. For the former, we evaluate on the standard setting of link prediction, along with a more granular relation-focused evaluation. The results show that our approach is comparable or superior compared to the state-of-the-art methods, particularly on the numeric literals with higher correlation and long-tail relations.
- Dependencies for the outsourced and modified codebase in
MKGA/- Details in the original repo
- Dependencies for the codebase in
experiments/- Python 3.10.13
- Poetry 1.7.1 (https://python-poetry.org/docs/#installation)
- The remaining dependencies are in
pyproject.tomland are installed usingpoetry(See section Preparation).
This project consists of a main codebase (in experiments/) and 2 outsourced codebases. The top-level structure is shown below:
├── MKGA # outsourced and modified codebase for training and testing KGE models on node classification
├── pykeen-with-realite # outsourced and modified codebase for training and testing KGE models on link prediction, incl. proposed KGE model
└── experiments # codebase for link prediction evaluation as presented in the paperMore detailed structure of the main codebase (experiments/):
├── datasets # download locations for datasets used for link prediction
│ ├── fb15k237 # FB15k-237 dataset enhanced with numeric literals
│ ├── yago15k # YAGO15k dataset enhanced with numeric literals
├── results # link prediction results for ReaLitE incl. the best found configurations
│ ├── FB15K237Literal
│ ├── YAGO15KLiteral
└── model_evaluation # scripts for extended link prediction evaluation
└── trained_model_tests # results of additional link prediction tests
├── test_with_relation_filter_kga # results of relation-focused link prediction tests for existing KGE models
├── test_with_relation_filter_pykeen # results of relation-focused link prediction tests for ReaLitEThe implementation of ReaLitE can be found in pykeen-with-realite/src/pykeen/models/multimodal/ in the following files:
├── base.py
├── complex_realite_variations.py
├── conve_realite_variations.py
├── distmult_realite_variations.py
├── rotate_realite_variations.py
├── transe_realite_variations.py
└── tucker_realite_variations.py- Run
poetry install - For downloading the datasets and preparing the KGA codebase, follow the instructions in the Preparation_README.md file.
python experiments/reproduce_pipeline.py <dataset> <model>- Options for
<dataset>:YAGO15KLiteralorFB15K237Literal - Options for
<model>:TransEReaLitE,DistMultReaLitE,ComplExReaLitE,RotatEReaLitEorTuckERReaLitE
- Activate Environment: Ensure you are in the Python environment for the main
ReaLitEproject, not the KGA environment. - Navigate: Change your current directory to the root of the
ReaLitEproject. - Run: Execute the following script. This script internally calls the KGA environment (using the
KGA_ENV_PYTHON_PATHyou configured) to perform tests using the trained KGA models.python experiments/model_evaluation/relation_focused_test_kga.py
ReaLitE model. This should be done using either the configurations from the paper (see Link Prediction Overall Evaluation) or the PyKEEN interface.
Note: Step 1 will generate (and overwrite)
experiments/model_evaluation/best_runs.csvfile, so for each dataset these steps should be repeated.
- Identify Best Model Instances: Find the best trained model instances per vanilla KGE model on a specific dataset.
Options for
python experiments/model_evaluation/best_runs_finder.py <dataset>
<dataset>:YAGO15KLiteralorFB15K237Literal - Perform Relation-Focused Testing: Execute relation-specific tests using the identified model instances.
Note: This step is configured for the
YAGO15KLiteraldataset. To run it for other datasets, you need to modify theYAGO15K_RELS_WITH_HIGH_LITERAL_CORR,YAGO15K_SYMMETRIC_RELS, andDATASET_CLASS_TO_EVALUATEvariables in the script.python experiments/model_evaluation/relation_focused_test_pykeen.py
- Configure: Modify the contents of
MKGA/config/multiple_realite.yaml. Choose the dataset(s) andReaLitEmodel variation(s) you want to evaluate by tweaking thedataloadandembedproperties. - Overwrite: Copy the contents of
MKGA/config/multiple_realite.yamland use them to overwrite the contents ofMKGA/config/multiple.yaml. - Activate Environment: Switch to your Python environment created for the
MKGA/codebase. - Navigate: Change your current directory to
MKGA/src/.(Adjust path relative to your current location if needed)cd MKGA/src/ - Run: Execute the auto-evaluation script:
python autoevaluate.py
This software is open-sourced under the AGPL-3.0 license. See the LICENSE file for details. For a list of open source components included in this project, see the file 3rd-party-licenses.txt.
If you use our software in your scientific work, please cite our paper:
@inproceedings{klironomos2025realite,
author = {Klironomos, Antonis and Zhou, Baifan and Zheng, Zhuoxun and Mohamed, Gad-Elrab and Paulheim, Heiko and Kharlamov, Evgeny},
title = {ReaLitE: Enrichment of Relation Embeddings in Knowledge Graphs Using Numeric Literals},
year = {2025},
isbn = {978-3-031-94574-8},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
url = {https://doi.org/10.1007/978-3-031-94575-5_3},
doi = {10.1007/978-3-031-94575-5_3},
abstract = {Most knowledge graph embedding (KGE) methods tailored for link prediction focus on the entities and relations in the graph, giving little attention to other literal values, which might encode important information. Therefore, some literal-aware KGE models attempt to either integrate numerical values into the embeddings of the entities or convert these numerics into entities during preprocessing, leading to information loss. Other methods concerned with creating relation-specific numerical features assume completeness of numerical data, which does not apply to real-world graphs. In this work, we propose ReaLitE, a novel relation-centric KGE model that dynamically aggregates and merges entities’ numerical attributes with the embeddings of the connecting relations. ReaLitE is designed to complement existing conventional KGE methods while supporting multiple variations for numerical aggregations, including a learnable method. We comprehensively evaluated the proposed relation-centric embedding using several benchmarks for link prediction and node classification tasks. The results showed the superiority of ReaLitE (Pronounced as “reality”, code: ) over the state of the art in both tasks.},
booktitle = {The Semantic Web: 22nd European Semantic Web Conference, ESWC 2025, Portoroz, Slovenia, June 1–5, 2025, Proceedings, Part I},
pages = {41–58},
numpages = {18},
location = {Portoroz, Slovenia}
}