Probabilistic prescribing model for IV-to-Oral antibiotic switching predictions. Uses Neural Processes (ConvGNP), GBDT/Logistic Regression, and baseline models trained on MIMIC-IV healthcare time series data.
This project uses uv for Python package management.
curl -LsSf https://astral.sh/uv/install.sh | shgit clone https://github.com/SAFEHR-data/ivos-model.git
cd ivos-model
uv sync --all-extras
uv run pre-commit install# Show CLI help
uv run primitivo-model --help
# Train models
uv run primitivo-model nps train --route simple-charts-dev --data-source mimic4
uv run primitivo-model tabular train-gbdt --route simple-charts-dev --data-source mimic4
uv run primitivo-model baseline repeat-last --route simple-charts-dev --data-source mimic4
# Process MIMIC data
uv run primitivo-model mimic process --smoke-test
# Show configuration
uv run primitivo-model configuv run pytest
uv run pytest tests/test_criteria.py -vuv run ruff check .
uv run ruff format .
uv run pre-commit run --all-filesAll experiments log to a local mlruns/ directory.
mlflow ui --port 5000The models are trained on MIMIC-IV, which requires credentialed access via PhysioNet. To prepare the database:
-
Download the CSV files (requires a PhysioNet account with signed data use agreement):
wget -r -N -c -np --user YOUR_USERNAME --ask-password \ https://physionet.org/files/mimiciv/3.1/ -
Build the DuckDB database using mimic-code:
git clone https://github.com/MIT-LCP/mimic-code # Follow the instructions at: # https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iv/buildmimic/duckdb
Note for v3.1: CSV parsing requires a small fix — on line 115 of the build script, change:
COPY $TABLE_NAME FROM '$FILE' (HEADER);
to:
COPY $TABLE_NAME FROM '$FILE' (DELIMITER ',', HEADER, ESCAPE '"');
See mimic-code#1881 for details.
-
Validate the database:
duckdb mimic4.db < mimic-code/mimic-iv/postgres/validate.sql -
Process into model format, pointing
DATA_ROOTat the directory containingmimic4/mimic4.db:export DATA_ROOT=/path/to/your/data uv run primitivo-model mimic process
By default DATA_ROOT is the data/ directory in the project root.
The Radix dataset used in this project is not publicly available.