https://github.com/graykode/xlnet-Pytorch/blob/cb793a1c75bdc59e3360f04ec641af726719811f/xlnet.py#L163 In your implementation, the FFN module only has one linear layer. is it a bug?