Clean evaluation test for task1

Hello,

As promised I pushed a new version of the evaluation dataset for task1.

@anuzzolese: feel free to review and modify it if you see anything that seems wrong to you.

Best.