-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hi, your work is impressive!
I have some questions while reading the source code regarding preprocessing.
-
Why do you assign score 1 (max score) to paths that retrieve no leaves?
In thecal_path_valfunction, you always return1when thepreds(leaves deduced from a path) is an empty set. This means when you filtering paths for pretraining, paths that lead to no leaves will always be selected. Isn't it irrational that you regard a invalid path as with highest score? -
Using the HIT score as the metric is also a debatable choice. It makes sense for questions like What are the books written by Ogai Mori?, but it does not help when you ask What is the most famous book from Ogai Mori?. Since even if the path Ogai Mori --write--> A, I, U, E, O, etc is retrieved, it will likely be eliminated since the HIT score will be very low (1 / n_books_from_ogai)
-
Besides, could you please briefly explain the rationale from L35 - L 49 in
negative_sampling.py?
In my interpretation, it means if the number of candidate entities is too large, then you simply discard this path. Am I correct?