is the annotation data available?

I've just finished reading your paper and I'm trying to understand how accuracy was defined for these different tasks. 

Is the annotation / evaluation data available somewhere for reproducibility? I couldn't find it in this repo.

Thanks in advance :)