Agent-assisted systematic review (microbiome × metabolic disease)

Reproducible pipeline to identify and evaluate human randomized, placebo-controlled clinical trials (2014–2024) of live microbiome interventions in metabolic disease (T2D/prediabetes, obesity/overweight, MASLD/NAFLD, NASH).

This repo implements “skills” as standalone CLI scripts that are idempotent, schema-validated, and auditable:

Every step writes a run_manifest.json.
LLM steps store per-record request.json, response.json, parsed.json and validate against JSON Schemas.
Outputs are stable, column-ordered csv + parquet tables under data/outputs/.

flowchart LR
  A[search_pubmed] --> B[build_table]
  B --> C[screen_abstracts_llm]
  C --> D[download_pdfs]
  D --> E[extract_fulltext_llm]
  E --> F[rob2_llm]
  F --> G[apply_overrides_and_export]
  G --> O[data/outputs/*]
  H[overrides/overrides.jsonl] -. human feedback .-> G

Quickstart

Create an environment and install:

make setup

Configure env vars (at minimum OpenAI for LLM steps):

cp .env.example .env

Run the pipeline:

make search
make table
make pdfs
make screen_abstracts
make extract
make rob2
make export
# or:
make all

Pipeline steps (“skills”)

scripts/00_search_pubmed.py: PubMed search + metadata/abstract fetch → data/raw/*.
scripts/01_build_table.py: hint prefill + master screening table → data/outputs/screening_table.*.
scripts/02_download_pdfs.py: best-effort PDF retrieval + text extraction → data/raw/pdfs/* and data/intermediate/fulltext_text/*.
scripts/03_screen_abstracts_llm.py: LLM abstract screening (structured outputs) → updates table + artifacts.
scripts/04_extract_fulltext_llm.py: LLM full-text extraction (structured outputs) → updates table + artifacts.
scripts/05_rob2_llm.py: LLM RoB2 judgments (structured outputs) → updates table + artifacts.
scripts/06_apply_overrides_and_export.py: deterministic tiering + overrides + exports + PRISMA counts.

Outputs

data/outputs/screening_table.parquet and .csv
data/outputs/final_table.parquet and .csv
data/outputs/prisma_counts.json
LLM artifacts under data/intermediate/llm/*/{record_id}/

Documentation

See:

docs/protocol.md
docs/decision_rules.md
docs/rob2_guidance.md
docs/endpoint_taxonomy.md
docs/data_dictionary.md

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ROB_protocols		ROB_protocols
docs		docs
overrides		overrides
protocol		protocol
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent-assisted systematic review (microbiome × metabolic disease)

Quickstart

Pipeline steps (“skills”)

Outputs

Documentation

About

Uh oh!

Releases

Packages

Languages

Trhova/Agent-assisted-systematic-review

Folders and files

Latest commit

History

Repository files navigation

Agent-assisted systematic review (microbiome × metabolic disease)

Quickstart

Pipeline steps (“skills”)

Outputs

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages