|
| 1 | +# DuckDB |
| 2 | + |
| 3 | +The script in this folder creates the schema for MIMIC-IV and |
| 4 | +loads the data into the appropriate tables for |
| 5 | +[DuckDB](https://duckdb.org/). |
| 6 | +DuckDB, like SQLite, is serverless and |
| 7 | +stores all information in a single file. |
| 8 | +Unlike SQLite, an OLTP database, |
| 9 | +DuckDB is an OLAP database, and therefore optimized for analytical queries. |
| 10 | +This will result in faster queries for researchers using MIMIC-IV |
| 11 | +with DuckDB compared to SQLite. |
| 12 | +To learn more, please read their ["why duckdb"](https://duckdb.org/docs/why_duckdb) |
| 13 | +page. |
| 14 | + |
| 15 | +The instructions to load MIMIC-III into a DuckDB |
| 16 | +only require: |
| 17 | +1. DuckDB to be installed and |
| 18 | +2. Your computer to have a POSIX-compliant terminal shell, |
| 19 | + which is already found by default on any Mac OSX, Linux, or BSD installation. |
| 20 | + |
| 21 | +To use these instructions on Windows, |
| 22 | +you need a Unix command line environment, |
| 23 | +which you can obtain by either installing |
| 24 | +[Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10) |
| 25 | +or [Cygwin](https://www.cygwin.com/). |
| 26 | + |
| 27 | +## Set-up |
| 28 | + |
| 29 | +### Quick overview |
| 30 | + |
| 31 | +1. [Install](https://duckdb.org/docs/installation/) the CLI version of DuckDB |
| 32 | +2. [Download](https://physionet.org/content/mimiciii/1.4/) the MIMIC-III files |
| 33 | +3. Create DuckDB database and load data |
| 34 | + |
| 35 | +### Install DuckDB |
| 36 | + |
| 37 | +Follow instructions on their website to |
| 38 | +[install](https://duckdb.org/docs/installation/) |
| 39 | +the CLI version of DuckDB. |
| 40 | + |
| 41 | +You will need to place the `duckdb` binary in a folder on your environment path, |
| 42 | +e.g. `/usr/local/bin`. |
| 43 | + |
| 44 | +### Download MIMIC-III files |
| 45 | + |
| 46 | +[Download](https://physionet.org/content/mimiciii/1.4/) |
| 47 | +the CSV files for MIMIC-III by any method you wish. |
| 48 | + |
| 49 | +The intructions assume the CSV files are in the folder structure as follows: |
| 50 | + |
| 51 | +``` |
| 52 | +mimic_data_dir |
| 53 | + ADMISSIONS.csv.gz |
| 54 | + ... |
| 55 | +``` |
| 56 | + |
| 57 | +The CSV files can be uncompressed (end in `.csv`) or compressed (end in `.csv.gz`). |
| 58 | + |
| 59 | +The easiest way to download them is to open a terminal then run: |
| 60 | + |
| 61 | +``` |
| 62 | +wget -r -N -c -np -nH --cut-dirs=1 --user YOURUSERNAME --ask-password https://physionet.org/files/mimiciii/1.4/ |
| 63 | +``` |
| 64 | + |
| 65 | +Replace `YOURUSERNAME` with your physionet username. |
| 66 | + |
| 67 | +This will make you `mimic_data_dir` be `mimiciii/1.4`. |
| 68 | + |
| 69 | +# Create DuckDB database and load data |
| 70 | + |
| 71 | +The last step requires creating a DuckDB database and |
| 72 | +loading the data into it. |
| 73 | + |
| 74 | +You can do all of this will one shell script, `import_duckdb.sh`, |
| 75 | +located in this repository. |
| 76 | + |
| 77 | +See the help for it below: |
| 78 | + |
| 79 | +```sh |
| 80 | +$ ./import_duckdb.sh -h |
| 81 | +./import_duckdb.sh: |
| 82 | +USAGE: ./import_duckdb.sh mimic_data_dir [output_db] |
| 83 | +WHERE: |
| 84 | + mimic_data_dir directory that contains csv.gz or csv files |
| 85 | + output_db: optional filename for duckdb file (default: mimic3.db) |
| 86 | +$ |
| 87 | +``` |
| 88 | + |
| 89 | +Here's an example invocation that will make the database in the default "mimic3.db": |
| 90 | + |
| 91 | +```sh |
| 92 | +$ ./import_duckdb.sh physionet.org/files/mimiciii/1.4 |
| 93 | + |
| 94 | +... output removed |
| 95 | +Successfully finished loading data into mimic3.db. |
| 96 | + |
| 97 | +$ ls -lh mimic3.db |
| 98 | +-rw-rw-r--. 1 myuser mygroup 26G Jan 25 16:11 mimic3.db |
| 99 | +``` |
| 100 | + |
| 101 | +The script will print out progress as it goes. |
| 102 | +Be patient, this can take minutes to hours to load |
| 103 | +depending on your computer's configuration. |
| 104 | + |
| 105 | +# Help |
| 106 | + |
| 107 | +Please see the [issues page](https://github.com/MIT-LCP/mimic-iii/issues) to discuss other issues you may be having. |
0 commit comments