Skip to content

Simple R interface for PubMed data retrieval: abstracts, full-text (PMC), bibliometrics (iCite), and entity annotations (PubTator3).

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

jaytimm/puremoe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

puremoe

PubMed Unified REtrieval for Multi-Output Exploration. An R package that provides a single interface for accessing a range of NLM/PubMed databases, including:

This unified interface simplifies the data retrieval process, allowing users to interact with multiple PubMed services/APIs/output formats through a single R function.

Also included are MeSH thesaurus resources as simple data frames: Descriptor Terms, Descriptor Tree Structures, and Supplementary Concept Terms, via the mesh-resources library.

The package provides a straightforward retrieval interface for PubMed literature, with utility in LLM workflows and RAG applications requiring access to abstracts, full-text articles, entity annotations, and bibliometric data.

Installation

Get the released version from CRAN:

install.packages('puremoe')

Or the development version from GitHub with:

remotes::install_github("jaytimm/puremoe")

PubMed search

The package has two basic functions: search_pubmed and get_records. The former fetches PMIDs from the PubMed API based on user search; the latter scrapes PMID records from a user-specified PubMed endpoint – pubmed_abstracts, pubmed_affiliations, pubtations, icites, or pmc_fulltext.

Search syntax is the same as that implemented in standard PubMed search.

pmids <- puremoe::search_pubmed('("political ideology"[TiAb])') 

Get record-level data

pubmed <- pmids |> 
  puremoe::get_records(endpoint = 'pubmed_abstracts', 
                       cores = 3, 
                       sleep = 1,
                       ncbi_key = ncbi_key) 

affiliations <- pmids |> 
  puremoe::get_records(endpoint = 'pubmed_affiliations', 
                       cores = 1, 
                       sleep = 0.5)

icites <- pmids |>
  puremoe::get_records(endpoint = 'icites',
                       cores = 3,
                       sleep = 0.25)

pubtations <- pmids |> 
  puremoe::get_records(endpoint = 'pubtations',
                       cores = 2)

PMC full text

Full-text articles can be retrieved for PMIDs if available in PMC’s open-access collection. Use pmid_to_ftp() to get download URLs, then pass these to get_records(endpoint = 'pmc_fulltext') — useful for quick retrieval in LLM/chat contexts.

For bulk downloads, use data_pmc_list().

pmcs <- puremoe::pmid_to_ftp(pmids = pmids, ncbi_key = ncbi_key)
pmc_fulltext <- puremoe::get_records(pmcs[1:5]$url, endpoint = 'pmc_fulltext', cores = 1)

Endpoint information

Returns schema, columns, and rate limits for each endpoint. Useful in LLM app contexts for tool schemas.

endpoint_info() lists endpoints; endpoint_info('endpoint_name') returns details; format = 'json' for machine-readable output.

puremoe::endpoint_info()
## [1] "pubmed_abstracts"    "pubmed_affiliations" "icites"             
## [4] "pubtations"          "pmc_fulltext"
puremoe::endpoint_info('pmc_fulltext')
## $description
## [1] "Full-text articles from PubMed Central"
## 
## $returns
## [1] "data.frame"
## 
## $columns
## $columns$pmid
## [1] "PubMed ID (character)"
## 
## $columns$section
## [1] "Section heading (character)"
## 
## $columns$text
## [1] "Section text content (character)"
## 
## 
## $parameters
## $parameters$cores
## [1] "parallel workers"
## 
## 
## $input
## [1] "Requires FTP URLs from pmid_to_ftp()"
## 
## $rate_limit
## [1] "NCBI FTP: be respectful"
## 
## $notes
## [1] "One row per section; use after pmid_to_ftp() to get URLs. Not all PMIDs have PMC full text available."

License

MIT © [Jason Timm, MA, PhD]

Citation

If you use this package in your research, please cite:

citation("puremoe")

Issues

Report bugs or request features at https://github.com/username/puremoe/issues

Contributing

Contributions welcome! Please open an issue or submit a pull request.

About

Simple R interface for PubMed data retrieval: abstracts, full-text (PMC), bibliometrics (iCite), and entity annotations (PubTator3).

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages