It seems that the parameter initialized with **randn** (https://github.com/graykode/xlnet-Pytorch/blob/cb793a1c75bdc59e3360f04ec641af726719811f/xlnet.py#L119) will **lead to low-performance**, and I tried **xavier_norm** and **kaiming_uniform**, both reach a much higher AUC and F1 score in my task.