Skip to content

Latest commit

 

History

History
84 lines (70 loc) · 4.37 KB

File metadata and controls

84 lines (70 loc) · 4.37 KB

Preparation for Link Prediction Experiments

This document provides instructions for downloading datasets, then setting up and using the outsourced KGA codebase. These steps are required to reproduce the Link Prediction evaluation presented in the ReaLitE paper.

Table of Contents

  1. Prerequisites
  2. Dataset Preparation for Link Prediction
  3. KGA Code Modifications
  4. Existing Models Training for Link Prediction

Prerequisites

  1. Clone the KGA repository: Inside the main ReaLitE project directory, clone the specified KGA repository into a directory named KGA/:
    git clone https://github.com/Otamio/KGA.git
  2. Setup Python Environment:
    1. Create and activate a dedicated Python environment for the KGA codebase.
    2. Install the required dependencies for the KGA codebase. Refer to the original KGA repository's documentation (e.g., README.md or requirements.txt) for the specific list and installation instructions.
  3. Modify experiments/model_evaluation/relation_focused_test_kga.py:
    1. Find the variable KGA_ENV_PYTHON_PATH within this script.
    2. Update its value to the absolute path of the Python interpreter inside your dedicated KGA environment. For example:
      KGA_ENV_PYTHON_PATH = '/path/to/your/kga_env/bin/python'

Dataset Preparation for Link Prediction

Run ./download_datasets.sh to download the datasets used for link prediction. This script will save each dataset in a separate directory within datasets/:

  • fb15k237/ - FB15k-237 dataset enhanced with numeric literals.
  • yago15k/ - YAGO15k dataset enhanced with numeric literals.
  • fb15k237_synthetic_lit_eval/ - FB15k-237-based synthetic dataset for literal usage evaluation.

KGA Code Modifications

After cloning the KGA repository, you need to apply specific modifications to ensure compatibility and add functionality required by ReaLitE:

  1. Modify KGA/augment/augment_utils.py:

    • Change line 82 from its original content to:
      start = float(bins[i]) + 1e-6
  2. Modify KGA/rotate/model.py:

    • Add the following line after line 459:
      model.enrich_embedding()
  3. Modify KGA/run.py:

    • Add the following code block after line 53:
      elif model.startswith("transe_gate"):
          command = f"python rotate/create_mapping.py --dataset {args.input}/{dataset} && " \
                    f"CUDA_VISIBLE_DEVICES={gpu} python -u rotate/run.py --do_train --cuda --do_valid --do_test --use_literal " \
                    f"--data_path {args.input}/{dataset} --model {model} -n 256 -b 1024 -d 1000 -g 24.0 -a 1.0 -adv " \
                    f"-lr 0.0001 --max_steps 150000 --valid_steps 5000 -save {args.output}/{dataset}_{model} " \
                    "--test_batch_size 16"
    • Add the following code block after line 59:
      elif model.startswith("rotate_gate"):
          command = f"python rotate/create_mapping.py --dataset {args.input}/{dataset} && " \
                    f"CUDA_VISIBLE_DEVICES={gpu} python -u rotate/run.py --do_train --cuda --do_valid --do_test --use_literal " \
                    f"--data_path {args.input}/{dataset} --model {model} -n 256 -b 1024 -d 1000 -g 24.0 -a 1.0 -adv " \
                    f"-lr 0.0001 --max_steps 150000 --valid_steps 5000 -save {args.output}/{dataset}_{model} " \
                    "--test_batch_size 16 -de"
    • Comment out lines 55, 61, 67, 73 and add command = in front of the lines 56, 62, 68, 74.

Existing Models Training for Link Prediction

  1. Activate Environment: Ensure you are in the Python environment for the KGA project.
  2. Navigate: Change your current directory to the root of the KGA project.
    cd KGA
    (Adjust path relative to your current location if needed)
  3. Run: Train existing KGE models as per KGA's instructions, which can be found in the KGA repository. The following is an example command for training a model:
    python run.py --dataset <dataset> --model <model> --save_best