i made a reimplement with tensorflow2, but it is not very well. In MIT dataset, the model loss value just decline to 2.3.