Unreasonable Samples in MANS.

Hi,

Thank you so much for providing this work, it is very inspiring and we are keen to use the resources and compare other newly proposed metrics.
However, I am not quite sure if I understand the paper and data correctly.
It seems that in Table 3, you split each unreasonable samples into 4 categories while in your provided data, there is a score of a list of 5 integers for each generation of each model (which I assume is the overall score by 5 annotators?) but there is no label for each story would unreasonable type it should belong to.
I am not quite sure if I have missed the details here how you decide which story belongs to which error type?
Also when you mention that you set reasonable and unreasonable samples with binary labels 1 and 0 in Section 4.2, does that mean all reasonable samples are considered four times for each problem types?
Like, for ROC, you have 46 Reasonable Samples as 1 and 22 Unreasonable Samples as 0 for Rept and then
46 Reasonable Samples as 1 for Unrel again and 319 Unreasonable Samples as 0 for Unrel type.
Any illustration on this would be much appreciated.
Thank you in advance!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreasonable Samples in MANS. #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unreasonable Samples in MANS. #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions