I am using the script for memalpha provided in the baselines to reproduce the results, but I found that the test results are significantly lower than those reported in the paper. I would like to ask whether the script used in the paper is the same as the one provided in the baselines, or if I have overlooked some key aspects?