Current Dataset manually constructs the input tensor, masks, etc. We can make it more clear by using transformers.BertTokenizerFast