Skip to content

Commit b9ed7a3

Browse files
authored
Merge pull request #1689 from MIT-LCP/duckdb_concepts
Add duckdb build/concepts and use SQLGlot to convert BigQuery SQL into other dialects
2 parents 8cb6028 + 1dfa41c commit b9ed7a3

File tree

163 files changed

+13606
-7860
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+13606
-7860
lines changed

.github/workflows/psql.yml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ jobs:
2828
- name: Check out repository code
2929
uses: actions/checkout@v3
3030

31+
- name: Install Python
32+
uses: actions/setup-python@v5
33+
with:
34+
python-version: '3.10'
35+
3136
- name: Download demo data
3237
uses: ./.github/actions/download-demo
3338

@@ -60,7 +65,7 @@ jobs:
6065
PGPASSWORD: postgres
6166
BUILDCODE_PATH: mimic-iv/buildmimic/postgres
6267

63-
- name: Build mimic-iv concepts
68+
- name: mimic-iv/concepts psql build
6469
run: |
6570
psql -h $POSTGRES_HOST -U postgres -f postgres-functions.sql
6671
psql -h $POSTGRES_HOST -U postgres -f postgres-make-concepts.sql
@@ -69,6 +74,16 @@ jobs:
6974
POSTGRES_HOST: postgres
7075
PGPASSWORD: postgres
7176

77+
- name: mimic_utils - convert mimic-iv concepts to PostgreSQL and rebuild
78+
run: |
79+
pip install .
80+
mimic_utils convert_folder mimic-iv/concepts mimic-iv/concepts_postgres --source_dialect bigquery --destination_dialect postgres
81+
psql -h $POSTGRES_HOST -U postgres -f mimic-iv/concepts_postgres/postgres-make-concepts.sql
82+
working-directory: ./
83+
env:
84+
POSTGRES_HOST: postgres
85+
PGPASSWORD: postgres
86+
7287
- name: Load ed data into PostgreSQL
7388
run: |
7489
echo "Loading data into psql."

README_mimic_utils.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# mimic_utils package
2+
3+
This package contains utilities for working with the MIMIC datasets.

mimic-iii/buildmimic/duckdb/README.md

Lines changed: 41 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,51 @@
1-
# DuckDB
1+
# MIMIC-III in DuckDB
22

3-
The script in this folder creates the schema for MIMIC-IV and
3+
The scripts in this folder create the schema for MIMIC-III and
44
loads the data into the appropriate tables for
55
[DuckDB](https://duckdb.org/).
6+
67
DuckDB, like SQLite, is serverless and
78
stores all information in a single file.
89
Unlike SQLite, an OLTP database,
910
DuckDB is an OLAP database, and therefore optimized for analytical queries.
10-
This will result in faster queries for researchers using MIMIC-IV
11+
This will result in faster queries for researchers using MIMIC-III
1112
with DuckDB compared to SQLite.
1213
To learn more, please read their ["why duckdb"](https://duckdb.org/docs/why_duckdb)
1314
page.
1415

15-
The instructions to load MIMIC-III into a DuckDB
16-
only require:
17-
1. DuckDB to be installed and
16+
## Download MIMIC-III files
17+
18+
[Download](https://physionet.org/content/mimiciii/1.4/)
19+
the CSV files for MIMIC-III by any method you wish.
20+
(These scripts should also work with the much smaller
21+
[demo version](https://physionet.org/content/mimiciii-demo/1.4/#files-panel)
22+
of the dataset.)
23+
24+
The easiest way to download them is to open a terminal then run:
25+
26+
```
27+
wget -r -N -c -np -nH --cut-dirs=1 --user YOURUSERNAME --ask-password https://physionet.org/files/mimiciii/1.4/
28+
```
29+
30+
Replace `YOURUSERNAME` with your physionet username.
31+
32+
The rest of these intructions assume the CSV files are in the folder structure as follows:
33+
34+
```
35+
mimic_data_dir/
36+
ADMISSIONS.csv.gz
37+
CALLOUT.csv.gz
38+
...
39+
```
40+
41+
By default, the above `wget` downloads the data into `mimiciii/1.4` (as we used `--cut-dirs=1` to remove the base folder). Thus, by default, `mimic_data_dir` is `mimiciii/1.4` (relative to the current folder). The CSV files can be uncompressed (end in `.csv`) or compressed (end in `.csv.gz`).
42+
43+
44+
## Shell script method (`import_duckdb.sh`)
45+
46+
Using this script to load MIMIC-III into a DuckDB
47+
only requires:
48+
1. DuckDB to be installed (the `duckdb` executable must be in your PATH)
1849
2. Your computer to have a POSIX-compliant terminal shell,
1950
which is already found by default on any Mac OSX, Linux, or BSD installation.
2051

@@ -24,14 +55,6 @@ which you can obtain by either installing
2455
[Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
2556
or [Cygwin](https://www.cygwin.com/).
2657

27-
## Set-up
28-
29-
### Quick overview
30-
31-
1. [Install](https://duckdb.org/docs/installation/) the CLI version of DuckDB
32-
2. [Download](https://physionet.org/content/mimiciii/1.4/) the MIMIC-III files
33-
3. Create DuckDB database and load data
34-
3558
### Install DuckDB
3659

3760
Follow instructions on their website to
@@ -41,37 +64,10 @@ the CLI version of DuckDB.
4164
You will need to place the `duckdb` binary in a folder on your environment path,
4265
e.g. `/usr/local/bin`.
4366

44-
### Download MIMIC-III files
45-
46-
[Download](https://physionet.org/content/mimiciii/1.4/)
47-
the CSV files for MIMIC-III by any method you wish.
48-
49-
The intructions assume the CSV files are in the folder structure as follows:
50-
51-
```
52-
mimic_data_dir
53-
ADMISSIONS.csv.gz
54-
...
55-
```
5667

57-
The CSV files can be uncompressed (end in `.csv`) or compressed (end in `.csv.gz`).
68+
### Create DuckDB database and load data
5869

59-
The easiest way to download them is to open a terminal then run:
60-
61-
```
62-
wget -r -N -c -np -nH --cut-dirs=1 --user YOURUSERNAME --ask-password https://physionet.org/files/mimiciii/1.4/
63-
```
64-
65-
Replace `YOURUSERNAME` with your physionet username.
66-
67-
This will make you `mimic_data_dir` be `mimiciii/1.4`.
68-
69-
# Create DuckDB database and load data
70-
71-
The last step requires creating a DuckDB database and
72-
loading the data into it.
73-
74-
You can do all of this will one shell script, `import_duckdb.sh`,
70+
You can do all of this with one shell script, `import_duckdb.sh`,
7571
located in this repository.
7672

7773
See the help for it below:
@@ -102,6 +98,7 @@ The script will print out progress as it goes.
10298
Be patient, this can take minutes to hours to load
10399
depending on your computer's configuration.
104100

101+
105102
# Help
106103

107-
Please see the [issues page](https://github.com/MIT-LCP/mimic-iii/issues) to discuss other issues you may be having.
104+
Please see the [issues page](https://github.com/MIT-LCP/mimic-code/issues) to discuss other issues you may be having.

0 commit comments

Comments
 (0)