I am interested in this study of mask diffusion model and would like to make a suggestion that the Repository does not seem to indicate the order and connection of execution of the two parameters for selecting training_mode during supervised training and fine-tuning, and the difference between the three models as selected models for running the code at this stage. This is necessary in the documentation.