Can you explain the CIRR evaluation code you used?
When I experimented with the CIRR val set based on the stage 2 model, the values seem to be very different from the paper based on R@5.
Is it based on pic2word code eval?
If possible, could you share the code via email?