e.g. PubMed is available in the `MLDatasets.jl` package: https://juliaml.github.io/MLDatasets.jl/dev/datasets/PubMed/