question about CIRR evaluation

Can you explain the CIRR evaluation code you used?

When I experimented with the CIRR val set based on the stage 2 model, the values seem to be very different from the paper based on R@5.

Is it based on pic2word code eval?

If possible, could you share the code via email?