This project is a research/competition prototype only and is not for clinical diagnosis, treatment, or medical decision-making.
This repository provides a local, offline-capable prototype for multimodal biomedical insight generation using isolated modality agents and a verifier layer.
Case Input
|-- vision_agent ------|
|-- ehr_agent ---------|
|-- genomics_agent ----|--> verifier_agent --> final JSON + audit trail
|-- literature_agent --|
Rules:
- Agents do not communicate with each other.
- Verifier is the only consumer of agent outputs.
- All outputs are strict Pydantic schemas and JSON-serializable.
- No invented citations. Literature evidence must come from retrieved local corpus documents.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt- Bootstrap base environment:
bash scripts/bootstrap_vm.sh- Install CUDA-matched torch/torchvision (example, adjust for your CUDA version):
source venv/bin/activate
pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision- Verify GPU:
python scripts/check_gpu.py- Configure runtime:
cp .env.example .env
# edit .env and set USE_REAL_* flags, HF_TOKEN, model ids, DEVICE=cudaEnvironment variables are loaded from .env (via python-dotenv).
Key flags:
USE_REAL_VISION=true|falseUSE_REAL_EHR=true|falseUSE_REAL_LITERATURE=true|falseDEVICE=cuda|cpuVISION_MODEL_IDEHR_MODEL_IDLIT_EMBED_MODEL_IDINFERENCE_TIMEOUT_SECONDSLITERATURE_CORPUS_PATH(defaultdata/processed/papers.jsonl)LIT_EMBED_CACHE_PATH(defaultdata/processed/lit_embeddings.npz)
Default mode remains backward compatible (USE_REAL_* = false).
Create canonical directories:
mkdir -p data/raw data/processedLiterature (Europe PMC):
python scripts/download_literature_europepmc.py \
--query "biomedical malignancy risk factors" \
--page-size 100 --pages 2 \
--output data/processed/papers.jsonlVision manifest normalization:
python scripts/normalize_vision_manifest.py \
--input data/raw/vision_manifest.csv \
--output data/processed/vision_manifest.csv \
--base-dir data/rawEHR manifest normalization:
python scripts/normalize_ehr_manifest.py \
--input data/raw/ehr_manifest.csv \
--output data/processed/ehr_manifest.csvGenomics QC report:
python scripts/genomics_qc.py data/raw/*.csv --output data/processed/genomics_qc.jsonpython scripts/prewarm_models.pyBuild literature embedding cache (if USE_REAL_LITERATURE=true):
python scripts/build_literature_embeddings.pyCLI pipeline:
python -m orchestrator.run sample_cases/case_01
python -m demo.run_case sample_cases/case_01Streamlit UI:
streamlit run apps/streamlit_app.pySystemd unit template:
deploy/systemd/agentic-health.service
Snapshot environment for reproducibility:
bash scripts/snapshot_env.shpytest -q
python -m eval.smoke_test- Vision agent: optional Hugging Face image-classification path with timeout and heuristic fallback.
- EHR agent: optional Hugging Face zero-shot path with timeout and rule-based fallback.
- Literature agent: embedding retrieval path with timeout and TF-IDF fallback.
- Genomics agent: deterministic robust CSV/QC baseline retained.