AdLinke, paperwork

Hi,

I was trying to reproduce results by running your code, and couldn't get exactly the same precision on SQuAD.
Here is what I got for bert_large model on SQuAD:
all_samples: 303
list_of_results: 303
global MRR: 0.3018861233236291
global Precision at 10: 0.5676567656765676
global Precision at 1: 0.16831683168316833

However, in the paper, the table shows that there should be 305 samples and the precision should be 17.4%.

At first, I guessed that it is because 2 samples are excluded because their object labels are out of the common vocabulary, but even after testing without common vocabulary, I got global Precision at 1: 0.1704918, which is still different to results in the paper.

Is there a way to reproduce the same results in the paper?
Please correct me if I made any mistakes! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdLinke, paperwork #70

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AdLinke, paperwork #70

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions