Tokenization is not the same across datasets. I don't know how big the issue is, but JNLPBA seems to have coarser tokenization than the other datasets.
For example, in JNLPBA "interleukin-n" is kept together while in the other datasets it appears as "interleukin", "-", "n".
Replace the JNLPBA corpus here with the one from: https://github.com/spyysalo/jnlpba
This will involve