-
Notifications
You must be signed in to change notification settings - Fork 0
Synthetic Data
Testing healthcare applications requires data, but using real patient data is risky and heavily regulated. DHTI solves this by making it easy to generate realistic synthetic data.
You can use Large Language Models (LLMs) to generate free-text synthetic data conformant to specific schemas or instructions.
npx dhti-cli synthetic [INPUT] [OUTPUT] [PROMPT]Flags:
-
-r, --maxRecords: Number of records to generate. -
-m, --maxCycles: Max cycles for iterative generation. -
-i, --inputField,-o, --outputField: JSON fields to target.
For generating complete patient records (FHIR bundles) with realistic histories:
See Synthea for detailed instructions on generating and uploading cohorts.
For researchers who need de-identified real hospital data, DHTI also supports loading the MIMIC-IV demo dataset. This gives you access to rich ICU data structures for testing complex clinical scenarios.
Note: MIMIC data is de-identified but derived from real events, offering a different kind of realism compared to Synthea's fully generated histories.