-
Notifications
You must be signed in to change notification settings - Fork 187
AdLinke, paperwork #70
Description
Hi,
I was trying to reproduce results by running your code, and couldn't get exactly the same precision on SQuAD.
Here is what I got for bert_large model on SQuAD:
all_samples: 303
list_of_results: 303
global MRR: 0.3018861233236291
global Precision at 10: 0.5676567656765676
global Precision at 1: 0.16831683168316833
However, in the paper, the table shows that there should be 305 samples and the precision should be 17.4%.
At first, I guessed that it is because 2 samples are excluded because their object labels are out of the common vocabulary, but even after testing without common vocabulary, I got global Precision at 1: 0.1704918, which is still different to results in the paper.
Is there a way to reproduce the same results in the paper?
Please correct me if I made any mistakes! Thanks!