One of the reviewers suggested that a valuable additional experiment could be to run the HAnDS QA training without starting from an initial squad fine-tuned model, instead just going directly from the pre-trained language model.
This is currently being left for future research, along with testing other pretrained models, both with and without initial SQuAD fine-tuning.