White-Box model results are not good

Great work in re-implementing the white-box model using pytorch. However, after testing, I found the results are not good as the official version by the authors. There is still a large gap. What do you think could be the reason? Do we need the train the model longer or something else?