It seems that there is a minor problem in measuring "text" relatedness.
According to the original paper, I run an experiment using Lee dataset. I got 0.67 score which is different to the result of the original paper (0.72).
Do you have any experiment in text relatedness?