-
Notifications
You must be signed in to change notification settings - Fork 7
Unreasonable Samples in MANS. #2
Description
Hi,
Thank you so much for providing this work, it is very inspiring and we are keen to use the resources and compare other newly proposed metrics.
However, I am not quite sure if I understand the paper and data correctly.
It seems that in Table 3, you split each unreasonable samples into 4 categories while in your provided data, there is a score of a list of 5 integers for each generation of each model (which I assume is the overall score by 5 annotators?) but there is no label for each story would unreasonable type it should belong to.
I am not quite sure if I have missed the details here how you decide which story belongs to which error type?
Also when you mention that you set reasonable and unreasonable samples with binary labels 1 and 0 in Section 4.2, does that mean all reasonable samples are considered four times for each problem types?
Like, for ROC, you have 46 Reasonable Samples as 1 and 22 Unreasonable Samples as 0 for Rept and then
46 Reasonable Samples as 1 for Unrel again and 319 Unreasonable Samples as 0 for Unrel type.
Any illustration on this would be much appreciated.
Thank you in advance!