There is a discrepancy between the loss formulae given in the paper (Appendix A), where both lookup and loss computations use normalized vectors, and the code lookup here and losses here, where only the lookup is done with normalized vectors. What do you think ?