Skip to content

Retionale behind path scoring #11

@yuancu

Description

@yuancu

Hi, your work is impressive!

I have some questions while reading the source code regarding preprocessing.

  1. Why do you assign score 1 (max score) to paths that retrieve no leaves?
    In the cal_path_val function, you always return 1 when the preds (leaves deduced from a path) is an empty set. This means when you filtering paths for pretraining, paths that lead to no leaves will always be selected. Isn't it irrational that you regard a invalid path as with highest score?

  2. Using the HIT score as the metric is also a debatable choice. It makes sense for questions like What are the books written by Ogai Mori?, but it does not help when you ask What is the most famous book from Ogai Mori?. Since even if the path Ogai Mori --write--> A, I, U, E, O, etc is retrieved, it will likely be eliminated since the HIT score will be very low (1 / n_books_from_ogai)

  3. Besides, could you please briefly explain the rationale from L35 - L 49 in negative_sampling.py?
    In my interpretation, it means if the number of candidate entities is too large, then you simply discard this path. Am I correct?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions