This document provides instructions for downloading datasets, then setting up and using the outsourced KGA codebase. These steps are required to reproduce the Link Prediction evaluation presented in the ReaLitE paper.
- Prerequisites
- Dataset Preparation for Link Prediction
- KGA Code Modifications
- Existing Models Training for Link Prediction
- Clone the
KGArepository: Inside the mainReaLitEproject directory, clone the specified KGA repository into a directory namedKGA/:git clone https://github.com/Otamio/KGA.git
- Setup Python Environment:
- Create and activate a dedicated Python environment for the
KGAcodebase. - Install the required dependencies for the
KGAcodebase. Refer to the originalKGArepository's documentation (e.g.,README.mdorrequirements.txt) for the specific list and installation instructions.
- Create and activate a dedicated Python environment for the
- Modify
experiments/model_evaluation/relation_focused_test_kga.py:- Find the variable
KGA_ENV_PYTHON_PATHwithin this script. - Update its value to the absolute path of the Python interpreter inside your dedicated
KGAenvironment. For example:KGA_ENV_PYTHON_PATH = '/path/to/your/kga_env/bin/python'
- Find the variable
Run ./download_datasets.sh to download the datasets used for link prediction. This script will save each dataset in a separate directory within datasets/:
fb15k237/- FB15k-237 dataset enhanced with numeric literals.yago15k/- YAGO15k dataset enhanced with numeric literals.fb15k237_synthetic_lit_eval/- FB15k-237-based synthetic dataset for literal usage evaluation.
After cloning the KGA repository, you need to apply specific modifications to ensure compatibility and add functionality required by ReaLitE:
-
Modify
KGA/augment/augment_utils.py:- Change line 82 from its original content to:
start = float(bins[i]) + 1e-6
- Change line 82 from its original content to:
-
Modify
KGA/rotate/model.py:- Add the following line after line 459:
model.enrich_embedding()
- Add the following line after line 459:
-
Modify
KGA/run.py:- Add the following code block after line 53:
elif model.startswith("transe_gate"): command = f"python rotate/create_mapping.py --dataset {args.input}/{dataset} && " \ f"CUDA_VISIBLE_DEVICES={gpu} python -u rotate/run.py --do_train --cuda --do_valid --do_test --use_literal " \ f"--data_path {args.input}/{dataset} --model {model} -n 256 -b 1024 -d 1000 -g 24.0 -a 1.0 -adv " \ f"-lr 0.0001 --max_steps 150000 --valid_steps 5000 -save {args.output}/{dataset}_{model} " \ "--test_batch_size 16"
- Add the following code block after line 59:
elif model.startswith("rotate_gate"): command = f"python rotate/create_mapping.py --dataset {args.input}/{dataset} && " \ f"CUDA_VISIBLE_DEVICES={gpu} python -u rotate/run.py --do_train --cuda --do_valid --do_test --use_literal " \ f"--data_path {args.input}/{dataset} --model {model} -n 256 -b 1024 -d 1000 -g 24.0 -a 1.0 -adv " \ f"-lr 0.0001 --max_steps 150000 --valid_steps 5000 -save {args.output}/{dataset}_{model} " \ "--test_batch_size 16 -de"
- Comment out lines 55, 61, 67, 73 and add
command =in front of the lines 56, 62, 68, 74.
- Add the following code block after line 53:
- Activate Environment: Ensure you are in the Python environment for the
KGAproject. - Navigate: Change your current directory to the root of the
KGAproject.(Adjust path relative to your current location if needed)cd KGA - Run: Train existing KGE models as per KGA's instructions, which can be found in the
KGArepository. The following is an example command for training a model:python run.py --dataset <dataset> --model <model> --save_best