Hello, I ran the training script of BERT-BERT architecture in the example, but a poor simplification result is obtained. Is this a program error? <img width="482" alt="1672366702490" src="https://user-images.githubusercontent.com/97029258/210027933-4f3042d7-6c23-4498-a381-239d4312cd01.png">