Unix-style command-line tools for searching and parsing PubMed articles. Designed for researchers and AI agents who want quick access to publication data without leaving the terminal.
# Search, parse, and filter
pm search "CRISPR cancer therapy" | pm fetch | pm parse | jq '.title'
# Full pipeline: search to PDF download
pm search "CRISPR review" --max 5 | pm fetch | pm parse | pm download --output-dir ./pdfs/- uv (Python package manager)
- Python >= 3.12 (installed automatically by uv if needed)
- Optional:
jqfor advanced JSON filtering (sudo apt install jqorbrew install jq)
Install directly from GitHub (no PyPI release):
uv tool install git+https://github.com/lescientifik/pm-tools.gitThis installs the pm command globally. You can then run pm from anywhere.
git clone https://github.com/lescientifik/pm-tools.git
cd pm-tools
uv syncWith a development install, prefix all commands with uv run (e.g., uv run pm search ...).
After installation, the first thing to do is run --help to discover available commands and options:
# Show all available commands and general usage
pm --help
# Show detailed help for a specific command (options, input/output format, examples)
pm search --help
pm fetch --help
pm parse --help
pm filter --help
pm download --help
pm cite --help
pm diff --help
pm quick --helpEvery command supports -h / --help. This is the best way to learn what each command does, what options it accepts, and how to use it. When in doubt, always run --help first.
All commands are subcommands of pm:
| Command | Input | Output | Purpose |
|---|---|---|---|
pm search |
Query string | PMIDs | Search PubMed |
pm fetch |
PMIDs (stdin) | XML | Download article data |
pm parse |
XML (stdin) | JSONL | Extract structured data |
pm filter |
JSONL (stdin) | JSONL | Filter by year/journal/author |
pm diff |
Two JSONL files | JSONL | Compare article collections |
pm download |
JSONL/PMIDs | PDFs | Download Open Access PDFs |
pm cite |
PMIDs (stdin) | CSL-JSON | Generate bibliography citations |
pm quick |
Query string | JSONL | One-command search pipeline |
Run pm <command> --help for detailed options, input/output formats, and examples for each command.
# Simplest: one command for quick results
pm quick "CRISPR cancer therapy"
# Search and get titles
pm search "machine learning diagnosis" --max 10 | pm fetch | pm parse | jq -r '.title'
# Filter to recent Nature papers with abstracts
pm search "quantum computing" --max 50 | pm fetch | pm parse | \
pm filter --year 2024- --journal nature --has-abstract
# Save results to JSONL for later use
pm search "alzheimer biomarkers" --max 100 | pm fetch | pm parse > papers.jsonl
# Export to CSV
pm search "alzheimer biomarkers" --max 100 | pm fetch | pm parse | \
jq -r '[.pmid, .year, .journal, .title] | @csv' > papers.csvpm filter lets you filter parsed articles without writing jq queries:
# Filter by year (exact, range, or open-ended)
pm filter --year 2024 # Exact year
pm filter --year 2020-2024 # Range
pm filter --year 2020- # 2020 and later
# Filter by journal (case-insensitive substring)
pm filter --journal nature
pm filter --journal "cell reports"
# Filter by author (case-insensitive, matches any author)
pm filter --author zhang
# Boolean filters
pm filter --has-abstract # Must have abstract
pm filter --has-doi # Must have DOI
# Combine filters (AND logic)
pm filter --year 2023- --journal nature --has-abstract
# Verbose mode shows filter stats
pm filter --year 2024 -v # Output: "15/50 articles passed filters"For interactive use when you just want to see results quickly:
# Basic quick search (default 100 results)
pm quick "CRISPR cancer therapy"
# Limit results
pm quick --max 20 "machine learning diagnosis"
# Verbose mode shows progress
pm quick -v "protein folding"pm quick is a convenience wrapper that runs the full pipeline (pm search | pm fetch | pm parse) in one command. For programmatic use or custom filtering, use the individual commands.
# Papers by a specific researcher
pm search "Doudna JA[author]" --max 10 | pm fetch | pm parse | \
jq -r '"\(.year) - \(.title[0:70])..."'
# Multiple authors (collaborations)
pm search "(Zhang F[author]) AND (Bhattacharya D[author])" | \
pm fetch | pm parse | jq '.title'Monitor specific journals for topics you care about:
# Recent Cell papers on organoids
pm search "organoids AND Cell[journal]" --max 20 | pm fetch | pm parse | \
pm filter --year 2024- | jq -r '.title'
# Compare publication counts across journals
pm search "immunotherapy" --max 200 | pm fetch | pm parse | \
jq -r '.journal' | sort | uniq -c | sort -rn | head -10Build a reading list with abstracts:
# Generate markdown reading list
pm search "CAR-T cell therapy review" --max 15 | pm fetch | pm parse | \
jq -r '"## \(.title)\n**\(.journal)** (\(.year)) - PMID: \(.pmid)\n\n\(.abstract // "No abstract")\n\n---\n"' \
> reading-list.md
# Find review articles specifically
pm search "neuroplasticity AND review[pt]" --max 10 | pm fetch | pm parse | \
jq -r '.title'# Look up a specific PMID
echo "12345678" | pm fetch | pm parse | jq .
# Batch lookup from a file
cat pmids.txt | pm fetch | pm parse > articles.jsonl
# Get DOI for citation
pm search "Yamanaka induced pluripotent" --max 1 | pm fetch | pm parse | \
jq -r '"DOI: \(.doi)\nTitle: \(.title)"'
# Get full citation in CSL-JSON format
echo "12345678" | pm cite | jq '.'# Preview what would be downloaded (dry-run)
pm search "CRISPR review" --max 10 | pm fetch | pm parse | \
pm download --dry-run
# Download PDFs to a directory
pm search "open access[filter] AND immunotherapy" --max 20 | \
pm fetch | pm parse | pm download --output-dir ./papers/
# Download with Unpaywall fallback (more coverage, requires email)
pm search "machine learning radiology" --max 10 | pm fetch | pm parse | \
pm download --output-dir ./pdfs/ --email you@university.edu
# Download from PMID list (auto-converts to DOI/PMCID)
cat pmids.txt | pm download --output-dir ./pdfs/Sources: pm download tries PMC Open Access first, then falls back to Unpaywall (if --email provided). Not all articles have free PDFs available.
# Get CSL-JSON citations for specific PMIDs
pm cite 28012456 29886577 > citations.jsonl
# Pipeline: search -> cite
pm search "CRISPR review" --max 10 | pm cite > citations.jsonl
# Convert to Pandoc-compatible bibliography
jq -s '.' citations.jsonl > bibliography.json
# Use with Pandoc
pandoc paper.md --citeproc --bibliography=bibliography.json -o paper.pdfOutput format (CSL-JSON):
{
"id": "pmid:28012456",
"type": "article-journal",
"title": "Article title...",
"author": [{"family": "Smith", "given": "John"}],
"container-title": "Nature",
"issued": {"date-parts": [[2024, 3, 15]]},
"volume": "627",
"page": "123-130",
"PMID": "28012456",
"DOI": "10.1038/xxxxx"
}pm cite vs pm parse:
| Feature | pm parse | pm cite |
|---|---|---|
| Abstract | Yes | No |
| Page numbers | No | Yes |
| Volume/Issue | No | Yes |
| Citation tools | Needs conversion | Direct (Zotero, Pandoc) |
Use pm cite for generating bibliographies; pm parse for content analysis.
# Fetch your entire research area (be patient, respects rate limits)
pm search "your niche topic" --max 1000 | pm fetch | pm parse > my-field.jsonl
# Then query locally (instant!)
pm filter --year 2020- < my-field.jsonl
pm filter --author smith --has-abstract < my-field.jsonl
# Or use jq for complex queries
jq 'select(.abstract | test("novel"; "i"))' my-field.jsonl# Papers per year for a topic
pm search "microbiome gut brain" --max 500 | pm fetch | pm parse | \
jq -r '.year' | sort | uniq -c | sort -k2
# Output:
# 12 2018
# 34 2019
# 67 2020
# 145 2021
# 203 2022# Desktop notification for new papers (Linux)
pm search "your topic AND 2024[dp]" --max 5 | pm fetch | pm parse | \
jq -r '.title' | head -1 | xargs -I {} notify-send "New Paper" "{}"
# Email yourself a digest
pm search "CRISPR 2024" --max 10 | pm fetch | pm parse | \
jq -r '"- \(.title) (\(.journal))"' | \
mail -s "Daily PubMed Digest" you@email.com
# Pipe to fzf for interactive selection
pm search "protein folding" --max 50 | pm fetch | pm parse | \
jq -r '"\(.pmid)\t\(.title)"' | \
fzf --preview 'echo {} | cut -f1 | xargs -I {} curl -s "https://pubmed.ncbi.nlm.nih.gov/{}"'For bulk analysis, download PubMed baseline files directly:
# Parse local baseline file (30,000 articles)
zcat pubmed25n0001.xml.gz | pm parse > baseline.jsonl
# Find all papers from a specific institution
jq 'select(.authors[]? | test("Harvard"))' baseline.jsonlUse pm diff to compare two JSONL files and find added, removed, or changed articles:
# Stream all differences as JSONL
pm diff baseline_v1.jsonl baseline_v2.jsonl
# Get list of new PMIDs (for fetching updates)
pm diff old.jsonl new.jsonl | jq -r 'select(.status=="added") | .pmid' | pm fetch | pm parse > new_articles.jsonl
# Filter to just changed articles
pm diff old.jsonl new.jsonl | jq 'select(.status=="changed")'
# Summary counts by status
pm diff old.jsonl new.jsonl | jq -s 'group_by(.status) | map({(.[0].status): length}) | add'
# Compare only metadata (ignore abstract changes)
pm diff old.jsonl new.jsonl --ignore abstract
# Quick check if files differ (for scripts)
if pm diff file1.jsonl file2.jsonl --quiet; then
echo "Files are identical"
else
echo "Files differ"
fiOutput format: Streaming JSONL with {"pmid":"...","status":"added|removed|changed",...}
Exit codes: 0 = identical, 1 = differences found, 2 = error
Each article is output as a JSON object (JSONL format):
{
"pmid": "12345678",
"title": "Article title here",
"authors": ["Smith John", "Doe Jane"],
"journal": "Nature",
"year": "2024",
"date": "2024-03-15",
"doi": "10.1038/xxxxx",
"pmcid": "PMC1234567",
"abstract": "Full abstract text..."
}Fields doi, pmcid, date, and abstract are omitted when not available.
Use standard PubMed search syntax:
| Query | Meaning |
|---|---|
cancer AND therapy |
Both terms |
"gene editing" |
Exact phrase |
Smith J[author] |
Author search |
Nature[journal] |
Journal filter |
2024[dp] |
Publication date |
review[pt] |
Publication type |
2020:2024[dp] |
Date range |
- Rate Limits: Tools respect NCBI's 3 requests/second limit automatically
- Batch Size:
pm fetchbatches 200 PMIDs per request for efficiency - Large Queries: Use
--maxto limit results, or paginate with date ranges - Verbose Mode: Add
--verbosetopm parseto see progress on large files
MIT