Tau and rho from the Summeval paper:

Tau and rho from the Unieval paper:

I believe the issue lies in how you compute the scores. Instead of calculating the Rouge score against the annotated reference, you compute it directly with the source text. Don’t you think this is unfair to scoring functions that have a limited token input, or to those that operate at the set level, like Rouge? Thank you.