我用您的模型在自己的模型上训练, 请问训练过程是 1. random initialization -> fine-tuning with a fixed learning rate 还是 2. freezing the BERT part and training the BiLSTM-CRF part -> fine-tuning the whole network with a small learning rate 因为有看到测试的时候似乎用用原BERT的representation 的