Skip to content

Commit 189ded0

Browse files
authored
Merge pull request #1120 from MIT-LCP/postgres_concepts
Adds PostgreSQL concepts generated automatically
2 parents c8d5990 + 1e0b3ba commit 189ded0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+6971
-5
lines changed

mimic-iv/buildmimic/postgres/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,13 @@ If following the tutorials, be sure to download the scripts locally and the MIMI
1111

1212
First ensure that Postgres is running on your computer. For installation instructions, see: [http://www.postgresql.org/download/](http://www.postgresql.org/download/)
1313

14-
Once Postgres is installed, clone the [mimic-iv](https://github.com/MIT-LCP/mimic-iv) repository into a local directory. We only need the contents of this directory, but it's useful to have the repository locally. You can clone the repository using the following command:
14+
Once Postgres is installed, clone the [mimic-code](https://github.com/MIT-LCP/mimic-code) repository into a local directory. We only need the contents of this directory, but it's useful to have the repository locally. You can clone the repository using the following command:
1515

1616
``` bash
17-
git clone https://github.com/MIT-LCP/mimic-iv.git
17+
git clone https://github.com/MIT-LCP/mimic-code.git
1818
```
1919

20-
Change to the `buildmimic/postgres/` directory. Create the schemas and tables with the following psql command. **This will delete any data present in the schemas.**
20+
Change to the `mimic-iv/buildmimic/postgres/` directory. Create the schemas and tables with the following psql command. **This will delete any data present in the schemas.**
2121

2222
```sh
2323
psql -f create.sql

mimic-iv/concepts/README.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# MIMIC-IV Concepts
2+
3+
This folder contains scripts to generate useful abstractions of raw MIMIC-IV data ("concepts").
4+
The scripts are written using the **BigQuery Standard SQL Dialect**. Concepts are categorized into folders if possible, otherwise they remain in the top-level directory. The [postgres](/mimic-iv/concepts/postgres) subfolder contains automatically generated PostgreSQL versions of these scripts; [see below for how these were generated](#postgresql-concepts). Concepts are categorized into folders if possible, otherwise they remain in the top-level directory.
5+
6+
The concepts are organized into individual SQL scripts, with each script generating a table. The BigQuery `mimic_derived` dataset under `physionet-data` contains the concepts pregenerated. Access to this dataset is available to MIMIC-IV approved users: see the [cloud instructions](https://mimic.mit.edu/docs/gettingstarted/cloud/) on how to access MIMIC-IV on BigQuery (which includes the derived concepts).
7+
8+
* [List of the concept folders and their content](#concept-index)
9+
* [Generating the concept tables on BigQuery](#generating-the-concepts-on-bigquery)
10+
* [Generating the concept tables on PostgreSQL](#generating-the-concepts-on-postgresql)
11+
12+
## Concept Index
13+
14+
## Generating the concepts on BigQuery
15+
16+
Generating the concepts requires the [Google Cloud SDK](https://cloud.google.com/sdk) to be installed.
17+
A shell script, [make_concepts.sh](/mimic-iv/concepts/make_concepts.sh), is provided which iterates over each folder and creates a table with the same name as the concept file. Concept names have been chosen specifically to avoid collisions.
18+
19+
Generating a single concept can be done by calling the Google Cloud SDK as follows:
20+
21+
```sh
22+
bq query --use_legacy_sql=False --replace --destination_table=my_bigquery_dataset.age < demographics/age.sql
23+
```
24+
25+
In general the concepts may be generated in any order, except for the *first_day_sofa* and *kdigo_stages* tables, which depend on other tables.
26+
27+
## Generating the concepts on PostgreSQL
28+
29+
These instructions are used to regenerate the [postgres](/mimic-iv/concepts/postgres) scripts from the BigQuery dialect scripts in the concepts folder.
30+
31+
* **If you just want to create PostgreSQL concepts for your installation of MIMIC-IV, go to the [postgres](/mimic-iv/concepts/postgres) subfolder**
32+
* If you would like to understand the process better, and possibly improve upon it, read on
33+
34+
Analogously to [MIMIC-III Concepts](https://github.com/MIT-LCP/mimic-code/tree/master/concepts), the SQL scripts here are written in BigQuery's Standard SQL syntax. The concepts have been carefully written to allow conversion to PostgreSQL, so that only the following changes are necessary to make them compaible with PostgreSQL:
35+
36+
* create postgres functions which emulate BigQuery functions
37+
* modify SQL scripts for incompatible syntax
38+
* run the modified SQL scripts and direct the output into tables in the PostgreSQL database
39+
40+
To do this, we have created a (*nix/Mac OS X) compatible shell script which performs regular expression replacements for each script. To simplify the process for users, we output these automatically generated scripts to the [postgres](/mimic-iv/concepts/postgres) folder.
41+
Re-running this shell script can be done as follows:
42+
43+
1. Open a terminal in the `concepts` folder.
44+
2. Run [convert_bigquery_to_postgres.sh](convert_bigquery_to_postgres.sh).
45+
* e.g. `bash convert_bigquery_to_postgres.sh`
46+
* This file outputs the scripts to the [postgres](/mimic-iv/concepts/postgres) subfolder after applying a few changes.
47+
* This also creates the `postgres_make_concepts.sql` script in the postgres subfolder.
48+
49+
### Known Problems
50+
51+
* [convert_bigquery_to_postgres.sh](convert_bigquery_to_postgres.sh) fails for [suspicion_of_infection](sepsis/suspicion_of_infection.sql) due to `, DATETIME_TRUNC(abx.starttime, DAY) AS antibiotic_date`. As a consequence also [sepsis3](sepsis/sepsis3.sql) fails.
52+
* The script runs repeatetly for subfolders `score` and `sepsis` to handle interdependecies between tables. Running the concept scripts in the correct order can be improved.
53+
* The regular expressions in [convert_bigquery_to_postgres.sh](convert_bigquery_to_postgres.sh) depend on the current SQL scripts and might fail when they are changed.
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
#!/bin/bash
2+
# This shell script converts BigQuery .sql files into PostgreSQL .sql files.
3+
4+
# String replacements are necessary for some queries.
5+
export REGEX_SCHEMA='s/`physionet-data.(mimic_core|mimic_icu|mimic_derived|mimic_hosp).(.+?)`/\1.\2/g'
6+
# Note that these queries are very senstive to changes, e.g. adding whitespaces after comma can already change the behavior.
7+
export REGEX_DATETIME_DIFF="s/DATETIME_DIFF\((.+?),\s?(.+?),\s?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATETIME_DIFF(\1,\2,'\3')/g"
8+
export REGEX_DATETIME_TRUNC="s/DATETIME_TRUNC\((.+?),\s?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATE_TRUNC('\2', \1)/g"
9+
# Add necessary quotes to INTERVAL, e.g. "INTERVAL 5 hour" to "INTERVAL '5' hour"
10+
export REGEX_INTERVAL="s/interval\s([[:digit:]]+)\s(hour|day|month|year)/INTERVAL '\1' \2/gI"
11+
# Add numeric cast to ROUND(), e.g. "ROUND(1.234, 2)" to "ROUND( CAST(1.234 as numeric), 2)".
12+
export PERL_REGEX_ROUND='s/ROUND\(((.|\n)*?)\, /ROUND\( CAST\( \1 as numeric\)\,/g'
13+
# Specific queries for some problems that arose with some files.
14+
export REGEX_INT="s/CAST\(hr AS INT64\)/CAST\(hr AS bigint\)/g"
15+
export REGEX_ARRAY="s/GENERATE_ARRAY\(-24, CEIL\(DATETIME\_DIFF\(it\.outtime_hr, it\.intime_hr, HOUR\)\)\)/ARRAY\(SELECT \* FROM generate\_series\(-24, CEIL\(DATETIME\_DIFF\(it\.outtime_hr, it\.intime_hr, HOUR\)\)\)\)/g"
16+
export REGEX_HOUR_INTERVAL="s/INTERVAL CAST\(hr AS INT64\) HOUR/interval \'1\' hour * CAST\(hr AS bigint\)/g"
17+
export REGEX_SECONDS="s/SECOND\)/\'SECOND\'\)/g"
18+
export CONNSTR='-U postgres -h localhost -p 5500 -d mimic-iv' # -d mimic
19+
20+
21+
# First, we re-create the postgres-make-concepts.sql file.
22+
echo "\echo ''" > postgres/postgres-make-concepts.sql
23+
24+
# Now we add some preamble for the user running the script.
25+
echo "\echo '==='" >> postgres/postgres-make-concepts.sql
26+
echo "\echo 'Beginning to create materialized views for MIMIC database.'" >> postgres/postgres-make-concepts.sql
27+
echo "\echo '"'Any notices of the form "NOTICE: materialized view "XXXXXX" does not exist" can be ignored.'"'" >> postgres/postgres-make-concepts.sql
28+
echo "\echo 'The scripts drop views before creating them, and these notices indicate nothing existed prior to creating the view.'" >> postgres/postgres-make-concepts.sql
29+
echo "\echo '==='" >> postgres/postgres-make-concepts.sql
30+
echo "\echo ''" >> postgres/postgres-make-concepts.sql
31+
32+
# reporting to stdout the folder being run
33+
echo -n "Dependencies:"
34+
35+
# output table creation calls to the make-concepts script
36+
echo "" >> postgres/postgres-make-concepts.sql
37+
echo "-- dependencies" >> postgres/postgres-make-concepts.sql
38+
39+
for dir_and_table in demographics.icustay_times demographics.weight_durations measurement.urine_output organfailure.kdigo_uo;
40+
do
41+
d=`echo ${dir_and_table} | cut -d. -f1`
42+
tbl=`echo ${dir_and_table} | cut -d. -f2`
43+
44+
# make the sub-folder for postgres if it does not exist
45+
mkdir -p "postgres/${d}"
46+
47+
# convert the bigquery script to psql and output it to the appropriate subfolder
48+
echo -n " ${d}.${tbl} .."
49+
echo "-- THIS SCRIPT IS AUTOMATICALLY GENERATED. DO NOT EDIT IT DIRECTLY." > "postgres/${d}/${tbl}.sql"
50+
echo "DROP TABLE IF EXISTS ${tbl}; CREATE TABLE ${tbl} AS " >> "postgres/${d}/${tbl}.sql"
51+
52+
# for two scripts, add a perl replace to cast rounded values as numeric
53+
if [[ "${tbl}" == "icustay_times" ]] || [[ "${tbl}" == "urine_output" ]]; then
54+
cat "${d}/${tbl}.sql" | sed -r -e "${REGEX_ARRAY}" | sed -r -e "${REGEX_HOUR_INTERVAL}" | sed -r -e "${REGEX_INT}" | sed -r -e "${REGEX_DATETIME_DIFF}" | sed -r -e "${REGEX_DATETIME_TRUNC}" | sed -r -e "${REGEX_SCHEMA}" | sed -r -e "${REGEX_INTERVAL}" | sed -r -e "${REGEX_SECONDS}" | perl -0777 -pe "${PERL_REGEX_ROUND}" >> "postgres/${d}/${tbl}.sql"
55+
else
56+
cat "${d}/${tbl}.sql" | sed -r -e "${REGEX_ARRAY}" | sed -r -e "${REGEX_HOUR_INTERVAL}" | sed -r -e "${REGEX_INT}" | sed -r -e "${REGEX_DATETIME_DIFF}" | sed -r -e "${REGEX_DATETIME_TRUNC}" | sed -r -e "${REGEX_SCHEMA}" | sed -r -e "${REGEX_INTERVAL}" | sed -r -e "${REGEX_SECONDS}" >> "postgres/${d}/${tbl}.sql"
57+
fi
58+
59+
# write out a call to this script in the make concepts file
60+
echo "\i ${d}/${tbl}.sql" >> postgres/postgres-make-concepts.sql
61+
done
62+
echo " done!"
63+
64+
# Iterate through each concept subfolder, and:
65+
# (1) apply the above regular expressions to update the script
66+
# (2) output to the postgres subfolder
67+
# (3) add a line to the postgres-make-concepts.sql script to generate this table
68+
69+
# order of the folders is important for a few tables here:
70+
# * scores (sofa et al) depend on labs, icustay_hourly
71+
# * sepsis depends on score (sofa.sql in particular)
72+
# * organfailure depends on measurement and firstday
73+
# the order *only* matters during the conversion step because our loop is
74+
# inserting table build commands into the postgres-make-concepts.sql file
75+
for d in demographics measurement comorbidity medication treatment firstday organfailure score sepsis;
76+
do
77+
mkdir -p "postgres/${d}"
78+
echo -n "${d}:"
79+
echo "" >> postgres/postgres-make-concepts.sql
80+
echo "-- ${d}" >> postgres/postgres-make-concepts.sql
81+
for fn in `ls $d`;
82+
do
83+
# only run SQL queries
84+
if [[ "${fn: -4}" == ".sql" ]]; then
85+
# table name is file name minus extension
86+
tbl="${fn::-4}"
87+
88+
# skip first_day_sofa as it depends on other firstday queries, we'll generate it later
89+
# we also skipped tables generated in the "Dependencies" loop above.
90+
if [[ "${tbl}" == "first_day_sofa" ]] || [[ "${tbl}" == "icustay_times" ]] || [[ "${tbl}" == "weight_durations" ]] || [[ "${tbl}" == "urine_output" ]] || [[ "${tbl}" == "kdigo_uo" ]] || [[ "${tbl}" == "sepsis3" ]]; then
91+
continue
92+
fi
93+
echo -n " ${tbl} .."
94+
echo "-- THIS SCRIPT IS AUTOMATICALLY GENERATED. DO NOT EDIT IT DIRECTLY." > "postgres/${d}/${tbl}.sql"
95+
echo "DROP TABLE IF EXISTS ${tbl}; CREATE TABLE ${tbl} AS " >> "postgres/${d}/${tbl}.sql"
96+
cat "${d}/${tbl}.sql" | sed -r -e "${REGEX_ARRAY}" | sed -r -e "${REGEX_HOUR_INTERVAL}" | sed -r -e "${REGEX_INT}" | sed -r -e "${REGEX_DATETIME_DIFF}" | sed -r -e "${REGEX_DATETIME_TRUNC}" | sed -r -e "${REGEX_SCHEMA}" | sed -r -e "${REGEX_INTERVAL}" | perl -0777 -pe "${PERL_REGEX_ROUND}" >> "postgres/${d}/${fn}"
97+
98+
echo "\i ${d}/${fn}" >> postgres/postgres-make-concepts.sql
99+
fi
100+
done
101+
echo " done!"
102+
done
103+
104+
# finally generate first_day_sofa which depends on concepts in firstday folder
105+
echo "" >> postgres/postgres-make-concepts.sql
106+
echo "-- final tables dependent on previous concepts" >> postgres/postgres-make-concepts.sql
107+
108+
for dir_and_table in firstday.first_day_sofa sepsis.sepsis3
109+
do
110+
d=`echo ${dir_and_table} | cut -d. -f1`
111+
tbl=`echo ${dir_and_table} | cut -d. -f2`
112+
113+
# make the sub-folder for postgres if it does not exist
114+
mkdir -p "postgres/${d}"
115+
116+
# convert the bigquery script to psql and output it to the appropriate subfolder
117+
echo -n " ${d}.${tbl} .."
118+
echo "-- THIS SCRIPT IS AUTOMATICALLY GENERATED. DO NOT EDIT IT DIRECTLY." > "postgres/${d}/${tbl}.sql"
119+
echo "DROP TABLE IF EXISTS ${tbl}; CREATE TABLE ${tbl} AS " >> "postgres/${d}/${tbl}.sql"
120+
121+
cat "${d}/${tbl}.sql" | sed -r -e "${REGEX_ARRAY}" | sed -r -e "${REGEX_HOUR_INTERVAL}" | sed -r -e "${REGEX_INT}" | sed -r -e "${REGEX_DATETIME_DIFF}" | sed -r -e "${REGEX_DATETIME_TRUNC}" | sed -r -e "${REGEX_SCHEMA}" | sed -r -e "${REGEX_INTERVAL}" | sed -r -e "${REGEX_SECONDS}" >> "postgres/${d}/${tbl}.sql"
122+
123+
# write out a call to this script in the make concepts file
124+
echo "\i ${d}/${tbl}.sql" >> postgres/postgres-make-concepts.sql
125+
done
126+
echo " done!"
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# PostgreSQL concepts
2+
3+
This folder contains scripts to generate useful abstractions of raw MIMIC-IV data ("concepts"). The
4+
scripts are intended to be run against the MIMIC-IV data in a PostgreSQL database.
5+
6+
**Most of these scripts are automatically generated by a shell script in the parent folder.** If you would like to contribute a correction, please look at the conversion shell script to ensure you edit the right script!
7+
8+
To generate concepts, change to this directory and run `psql`. Then within psql, run:
9+
10+
```sql
11+
-- NOTE: many scripts *require* you to use mimic_derived as the schema for outputting concepts
12+
-- change the search path at your peril!
13+
set search_path to mimic_derived, mimic_core, mimic_hosp, mimic_icu, mimic_ed;
14+
\i postgres-functions.sql -- only needs to be run once
15+
\i postgres-make-concepts.sql
16+
```
17+
18+
... or, execute the SQL files in your GUI of choice.

0 commit comments

Comments
 (0)