Skip to content

Commit 939ae2f

Browse files
committed
adds nw build postgres scripts.
1 parent c34baed commit 939ae2f

File tree

7 files changed

+819
-0
lines changed

7 files changed

+819
-0
lines changed

nw/buildnw/postgres/README.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Load NWICU into a PostgreSQL database
2+
3+
This directory contains scripts to create the schema and load the Northwestern ICU and Hospital data (NWICU) into PostgreSQL, following a structure similar to the MIMIC-IV build scripts.
4+
5+
## Quickstart
6+
7+
```sh
8+
# clone repo
9+
git clone https://github.com/MIT-LCP/mimic-code.git
10+
cd mimic-code
11+
12+
# download NWICU data
13+
wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/nwicu-northwestern-icu/0.1.0/
14+
# clean directory (run this command outside of physionet.org directory)
15+
mv physionet.org/files/nwicu-northwestern-icu nwicu && rmdir physionet.org/files && rm physionet.org/robots.txt && rmdir physionet.org
16+
17+
# create database
18+
createdb nw
19+
20+
# build and load NWICU tables
21+
psql -d nw -f nw/buildnw/postgres/create.sql
22+
psql -d nw -v ON_ERROR_STOP=1 -v nw_data_dir=nwicu/0.1.0 -f nw/buildnw postgres/load_gz.sql
23+
psql -d nw -f nw/buildnw/postgres/constraint.sql
24+
psql -d nw -f nw/buildnw/postgres/index.sql
25+
psql -d nw -f nw/buildnw/postgres/validate.sql
26+
```
27+
28+
## Detailed guide
29+
30+
First ensure that PostgreSQL is running on your computer. For installation instructions, see: [http://www.postgresql.org/download/](http://www.postgresql.org/download/)
31+
32+
### Install PostgreSQL
33+
34+
**On macOS (using Homebrew):**
35+
36+
```sh
37+
brew update
38+
brew install postgresql
39+
brew services start postgresql
40+
```
41+
42+
To check which user is running the PostgreSQL service, use:
43+
44+
```sh
45+
brew services list
46+
```
47+
48+
The 'User' column shows the macOS account running PostgreSQL. This is usually the username you should use for database connections unless you have created a different PostgreSQL user.
49+
50+
**On Ubuntu/Debian:**
51+
52+
```sh
53+
sudo apt update
54+
sudo apt install postgresql postgresql-contrib
55+
sudo service postgresql start
56+
```
57+
58+
**On Windows:**
59+
60+
1. Download the installer from https://www.postgresql.org/download/windows/
61+
2. Run the installer and follow the prompts to complete the installation.
62+
3. Start the PostgreSQL service from the Start Menu or Services app.
63+
64+
For more details, see the [official PostgreSQL download page](https://www.postgresql.org/download/).
65+
66+
### Download NWICU data
67+
68+
We can download Northwestern ICU (NWICU) database
69+
from [PhysioNet](https://physionet.org/content/nwicu-northwestern-icu/0.1.0/):
70+
71+
```sh
72+
wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/nwicu-northwestern-icu/0.1.0/
73+
mv physionet.org/files/nwicu-northwestern-icu nwicu && rmdir physionet.org/files && rm physionet.org/robots.txt && rmdir physionet.org
74+
```
75+
76+
### Specify a database for installation
77+
78+
Create the database if it does not already exist:
79+
80+
```sh
81+
createdb nw
82+
```
83+
84+
Set PostgreSQL environment variables:
85+
86+
We can use the provided script to set your environment variables for the current terminal session:
87+
88+
```sh
89+
source postgres_env.sh
90+
```
91+
92+
Replace `your_user` and `your_password` with your actual PostgreSQL username and password in sh script.
93+
94+
Instead of editing the script, you can pass your username and password as arguments:
95+
96+
```sh
97+
source postgres_env.sh myuser mypassword nw localhost 5432
98+
```
99+
100+
Once Postgres is installed, clone the [mimic-code](https://github.com/MIT-LCP/mimic-code) repository into a local directory.
101+
102+
``` bash
103+
git clone https://github.com/MIT-LCP/mimic-code.git
104+
```
105+
106+
Create the schemas and tables with the following psql command. **This will delete any data present in the schemas.** If you need to reload the data (for example, if you run the load scripts multiple times), simply rerun create.sql.
107+
This will drop all existing tables and recreate them, ensuring a clean slate before reloading your data.
108+
109+
```sh
110+
psql -d nw -f nw/buildnw/postgres/create.sql
111+
```
112+
113+
Afterwards, we need to load the NWICU files into the database. To do so, we'll specify the location of the local CSV files (compressed).
114+
Note that this assumes the folder structure is as follows:
115+
116+
```
117+
nwicu_data_dir
118+
nw_hosp
119+
admissions.csv.gz
120+
patients.csv.gz
121+
...
122+
nw_icu
123+
icustays.csv.gz
124+
...
125+
```
126+
127+
For example, if you downloaded and moved the files as above, your `nwicu_data_dir` would be `nwicu/0.1.0` and contain subfolders like `nw_hosp` and `nw_icu` with their respective compressed CSV files.
128+
129+
Once you have verified your data is stored in this structure, run:
130+
131+
```sh
132+
psql -d nw -v ON_ERROR_STOP=1 -v nw_data_dir=nwicu/0.1.0 -f nw/buildnw/postgres/load_gz.sql
133+
```
134+
135+
After loading the data, we can enforce data integrity by adding primary keys, foreign keys, and other constraints.
136+
137+
```sh
138+
psql -d nw -f nw/buildnw/postgres/constraint.sql
139+
```
140+
141+
We can also improve query performance by creating indexes, which allow the database to quickly find and retrieve data, especially in large tables.
142+
143+
```sh
144+
psql -d nw -f nw/buildnw/postgres/index.sql
145+
```
146+
147+
To ensure the data was loaded correctly, we can run validation checks.
148+
149+
```sh
150+
psql -d nw -f nw/buildnw/postgres/validate.sql
151+
```

nw/buildnw/postgres/constraint.sql

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---------------------------
2+
---------------------------
3+
-- Creating Primary Keys --
4+
---------------------------
5+
---------------------------
6+
7+
----------
8+
-- hosp --
9+
----------
10+
11+
-- admissions
12+
13+
ALTER TABLE nw_hosp.admissions DROP CONSTRAINT IF EXISTS admissions_pk CASCADE;
14+
ALTER TABLE nw_hosp.admissions
15+
ADD CONSTRAINT admissions_pk
16+
PRIMARY KEY (hadm_id);
17+
18+
ALTER TABLE nw_hosp.patients DROP CONSTRAINT IF EXISTS patients_pk CASCADE;
19+
ALTER TABLE nw_hosp.patients
20+
ADD CONSTRAINT patients_pk
21+
PRIMARY KEY (subject_id);
22+
23+
-- d_icd_diagnoses
24+
25+
ALTER TABLE nw_hosp.d_icd_diagnoses DROP CONSTRAINT IF EXISTS d_icd_diagnoses_pk CASCADE;
26+
ALTER TABLE nw_hosp.d_icd_diagnoses
27+
ADD CONSTRAINT d_icd_diagnoses_pk
28+
PRIMARY KEY (icd_code, icd_version);
29+
30+
-- diagnoses_icd
31+
32+
ALTER TABLE nw_hosp.diagnoses_icd DROP CONSTRAINT IF EXISTS diagnoses_icd_patients_fk CASCADE;
33+
ALTER TABLE nw_hosp.diagnoses_icd
34+
ADD CONSTRAINT diagnoses_icd_patients_fk
35+
FOREIGN KEY (subject_id)
36+
REFERENCES nw_hosp.patients (subject_id);
37+
38+
ALTER TABLE nw_hosp.diagnoses_icd DROP CONSTRAINT IF EXISTS diagnoses_icd_admissions_fk;
39+
ALTER TABLE nw_hosp.diagnoses_icd
40+
ADD CONSTRAINT diagnoses_icd_admissions_fk
41+
FOREIGN KEY (hadm_id)
42+
REFERENCES nw_hosp.admissions (hadm_id);
43+
44+
-- d_labitems
45+
46+
ALTER TABLE nw_hosp.d_labitems DROP CONSTRAINT IF EXISTS d_labitems_pk CASCADE;
47+
ALTER TABLE nw_hosp.d_labitems
48+
ADD CONSTRAINT d_labitems_pk
49+
PRIMARY KEY (itemid);
50+
51+
-- labevents
52+
53+
ALTER TABLE nw_hosp.labevents DROP CONSTRAINT IF EXISTS labevents_pk CASCADE;
54+
ALTER TABLE nw_hosp.labevents
55+
ADD CONSTRAINT labevents_pk
56+
PRIMARY KEY (labevent_id);
57+
58+
-- prescriptions
59+
60+
ALTER TABLE nw_hosp.prescriptions DROP CONSTRAINT IF EXISTS prescriptions_pk CASCADE;
61+
ALTER TABLE nw_hosp.prescriptions
62+
ADD CONSTRAINT prescriptions_pk
63+
PRIMARY KEY (pharmacy_id, drug);
64+
65+
-- emar
66+
67+
ALTER TABLE nw_hosp.emar DROP CONSTRAINT IF EXISTS emar_pk CASCADE;
68+
ALTER TABLE nw_hosp.emar
69+
ADD CONSTRAINT emar_pk
70+
PRIMARY KEY (emar_id);
71+
72+
---------
73+
-- icu --
74+
---------
75+
76+
-- icustays
77+
78+
ALTER TABLE nw_icu.icustays DROP CONSTRAINT IF EXISTS icustays_pk CASCADE;
79+
ALTER TABLE nw_icu.icustays
80+
ADD CONSTRAINT icustays_pk
81+
PRIMARY KEY (stay_id);
82+
83+
-- d_items
84+
85+
ALTER TABLE nw_icu.d_items DROP CONSTRAINT IF EXISTS d_items_pk CASCADE;
86+
ALTER TABLE nw_icu.d_items
87+
ADD CONSTRAINT d_items_pk
88+
PRIMARY KEY (itemid, label);
89+
90+
---------------------------
91+
---------------------------
92+
-- Creating Foreign Keys --
93+
---------------------------
94+
---------------------------
95+
96+
----------
97+
-- hosp --
98+
----------
99+
100+
-- admissions
101+
102+
ALTER TABLE nw_hosp.admissions DROP CONSTRAINT IF EXISTS admissions_patients_fk;
103+
ALTER TABLE nw_hosp.admissions
104+
ADD CONSTRAINT admissions_patients_fk
105+
FOREIGN KEY (subject_id)
106+
REFERENCES nw_hosp.patients (subject_id);
107+
108+
-- diagnoses_icd
109+
110+
ALTER TABLE nw_hosp.diagnoses_icd DROP CONSTRAINT IF EXISTS diagnoses_icd_patients_fk;
111+
ALTER TABLE nw_hosp.diagnoses_icd
112+
ADD CONSTRAINT diagnoses_icd_patients_fk
113+
FOREIGN KEY (subject_id)
114+
REFERENCES nw_hosp.patients (subject_id);
115+
116+
ALTER TABLE nw_hosp.diagnoses_icd DROP CONSTRAINT IF EXISTS diagnoses_icd_admissions_fk;
117+
ALTER TABLE nw_hosp.diagnoses_icd
118+
ADD CONSTRAINT diagnoses_icd_admissions_fk
119+
FOREIGN KEY (hadm_id)
120+
REFERENCES nw_hosp.admissions (hadm_id);
121+
122+
-- labevents
123+
124+
ALTER TABLE nw_hosp.labevents DROP CONSTRAINT IF EXISTS labevents_patients_fk;
125+
ALTER TABLE nw_hosp.labevents
126+
ADD CONSTRAINT labevents_patients_fk
127+
FOREIGN KEY (subject_id)
128+
REFERENCES nw_hosp.patients (subject_id);
129+
130+
ALTER TABLE nw_hosp.labevents DROP CONSTRAINT IF EXISTS labevents_d_labitems_fk;
131+
ALTER TABLE nw_hosp.labevents
132+
ADD CONSTRAINT labevents_d_labitems_fk
133+
FOREIGN KEY (itemid)
134+
REFERENCES nw_hosp.d_labitems (itemid);
135+
136+
-- prescriptions
137+
138+
ALTER TABLE nw_hosp.prescriptions DROP CONSTRAINT IF EXISTS prescriptions_patients_fk;
139+
ALTER TABLE nw_hosp.prescriptions
140+
ADD CONSTRAINT prescriptions_patients_fk
141+
FOREIGN KEY (subject_id)
142+
REFERENCES nw_hosp.patients (subject_id);
143+
144+
ALTER TABLE nw_hosp.prescriptions DROP CONSTRAINT IF EXISTS prescriptions_admissions_fk;
145+
ALTER TABLE nw_hosp.prescriptions
146+
ADD CONSTRAINT prescriptions_admissions_fk
147+
FOREIGN KEY (hadm_id)
148+
REFERENCES nw_hosp.admissions (hadm_id);
149+
150+
-- emar
151+
152+
ALTER TABLE nw_hosp.emar DROP CONSTRAINT IF EXISTS emar_patients_fk;
153+
ALTER TABLE nw_hosp.emar
154+
ADD CONSTRAINT emar_patients_fk
155+
FOREIGN KEY (subject_id)
156+
REFERENCES nw_hosp.patients (subject_id);
157+
158+
ALTER TABLE nw_hosp.emar DROP CONSTRAINT IF EXISTS emar_admissions_fk;
159+
ALTER TABLE nw_hosp.emar
160+
ADD CONSTRAINT emar_admissions_fk
161+
FOREIGN KEY (hadm_id)
162+
REFERENCES nw_hosp.admissions (hadm_id);
163+
164+
---------
165+
-- icu --
166+
---------
167+
168+
-- icustays
169+
170+
ALTER TABLE nw_icu.icustays DROP CONSTRAINT IF EXISTS icustays_patients_fk;
171+
ALTER TABLE nw_icu.icustays
172+
ADD CONSTRAINT icustays_patients_fk
173+
FOREIGN KEY (subject_id)
174+
REFERENCES nw_hosp.patients (subject_id);
175+
176+
ALTER TABLE nw_icu.icustays DROP CONSTRAINT IF EXISTS icustays_admissions_fk;
177+
ALTER TABLE nw_icu.icustays
178+
ADD CONSTRAINT icustays_admissions_fk
179+
FOREIGN KEY (hadm_id)
180+
REFERENCES nw_hosp.admissions (hadm_id);
181+
182+
-- chartevents
183+
184+
ALTER TABLE nw_icu.chartevents DROP CONSTRAINT IF EXISTS chartevents_patients_fk;
185+
ALTER TABLE nw_icu.chartevents
186+
ADD CONSTRAINT chartevents_patients_fk
187+
FOREIGN KEY (subject_id)
188+
REFERENCES nw_hosp.patients (subject_id);
189+
190+
ALTER TABLE nw_icu.chartevents DROP CONSTRAINT IF EXISTS chartevents_admissions_fk;
191+
ALTER TABLE nw_icu.chartevents
192+
ADD CONSTRAINT chartevents_admissions_fk
193+
FOREIGN KEY (hadm_id)
194+
REFERENCES nw_hosp.admissions (hadm_id);
195+
196+
ALTER TABLE nw_icu.chartevents DROP CONSTRAINT IF EXISTS chartevents_icustays_fk;
197+
ALTER TABLE nw_icu.chartevents
198+
ADD CONSTRAINT chartevents_icustays_fk
199+
FOREIGN KEY (stay_id)
200+
REFERENCES nw_icu.icustays (stay_id);
201+
202+
-- procedureevents
203+
204+
ALTER TABLE nw_icu.procedureevents DROP CONSTRAINT IF EXISTS procedureevents_patients_fk;
205+
ALTER TABLE nw_icu.procedureevents
206+
ADD CONSTRAINT procedureevents_patients_fk
207+
FOREIGN KEY (subject_id)
208+
REFERENCES nw_hosp.patients (subject_id);
209+
210+
ALTER TABLE nw_icu.procedureevents DROP CONSTRAINT IF EXISTS procedureevents_admissions_fk;
211+
ALTER TABLE nw_icu.procedureevents
212+
ADD CONSTRAINT procedureevents_admissions_fk
213+
FOREIGN KEY (hadm_id)
214+
REFERENCES nw_hosp.admissions (hadm_id);
215+
216+
ALTER TABLE nw_icu.procedureevents DROP CONSTRAINT IF EXISTS procedureevents_icustays_fk;
217+
ALTER TABLE nw_icu.procedureevents
218+
ADD CONSTRAINT procedureevents_icustays_fk
219+
FOREIGN KEY (stay_id)
220+
REFERENCES nw.icustays (stay_id);

0 commit comments

Comments
 (0)