It seems that the mpl mapping in the code is applied after the compression module, whereas in the paper, the mapping is performed before compression. Additionally, the code does not provide separate training scripts for different stages. However, based on the script parameters, it appears that both the compression module and the mapping module are trained simultaneously during the pre-training phase—contrary to the paper’s claim that the compression module was not introduced at this stage. Could you please clarify whether this is a better best practice than what was described in the paper, or if there is a misunderstanding on my part?