I've just finished reading your paper and I'm trying to understand how accuracy was defined for these different tasks. Is the annotation / evaluation data available somewhere for reproducibility? I couldn't find it in this repo. Thanks in advance :)