Skip to content

Commit 7cf79ac

Browse files
committed
update for v2.0
1 parent d63d65d commit 7cf79ac

File tree

1 file changed

+22
-27
lines changed

1 file changed

+22
-27
lines changed

mimic-iv/buildmimic/bigquery/README.md

Lines changed: 22 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Loading MIMIC-IV to BigQuery
22

3-
**YOU DO NOT NEED TO INSTALL MIMIC-IV YOURSELF!** MIMIC-IV has been loaded onto BigQuery by the LCP, and is available for credentialed researchers to access. If you are credentialed, then you may be granted access MIMIC-IV on BigQuery instantly by following the [cloud configuration tutorial](https://mimic-iv.mit.edu/docs/access/cloud/).
3+
**YOU DO NOT NEED TO INSTALL MIMIC-IV YOURSELF!** MIMIC-IV has been loaded onto BigQuery by the LCP, and is available for credentialed researchers to access. If you are credentialed, then you may be granted access MIMIC-IV on BigQuery instantly by following the [cloud configuration tutorial](https://mimic.mit.edu/docs/gettingstarted/cloud/).
44

55
The following instructions are provided for transparency and were used to create the current copy of MIMIC-IV on BigQuery.
66

@@ -38,43 +38,39 @@ gcloud init
3838

3939
---
4040

41-
## STEP 3: Verify you can access the MIMIC-IV files on Google Cloud Storage
41+
## STEP 3: Download the MIMIC-IV files
4242

43-
### A) Check the content of the bucket.
43+
Download the MIMIC-IV dataset files. The easiest way to download them is to open a terminal then run:
4444

45-
```sh
46-
gsutil ls gs://mimiciv-1.0.physionet.org
4745
```
48-
49-
It should list a zip file, and some auxiliary files associated with the project (SHA256SUMS.txt).
50-
51-
```sh
52-
gs://mimiciv-1.0.physionet.org/mimic-iv-1.0.zip
46+
wget -r -N -c -np --user YOURUSERNAME --ask-password https://physionet.org/files/mimiciv/2.0/
5347
```
5448

55-
Download and extract the zip file locally. Then, upload the resultant folders (`core`, `hosp`, and `icu`) to a GCP bucket of your choice:
49+
Replace `YOURUSERNAME` with your physionet username.
50+
51+
Then, upload the folders (`hosp` and `icu`) to a GCP bucket of your choice:
5652

5753
```sh
5854
bucket="mimic-data"
5955

60-
unzip mimic-iv-1.0.zip
61-
gsutil -m cp -r core hosp icu gs://$bucket/v1.0/
56+
gsutil -m cp -r hosp icu gs://$bucket/v2.0/
6257
```
6358

6459
## STEP 4: Create a new BigQuery dataset
6560

66-
### A) Create a new dataset for MIMIC-IV version 1.0
61+
### A) Create a new dataset for MIMIC-IV version 2.0
6762

68-
In this example, we have chosen **mimic4_v1_0** as the dataset name.
63+
In this example, we have chosen **mimic4_v2_0** as the dataset prefix for the ICU/hosp modules.
6964

7065
```sh
71-
bq mk --dataset --data_location US --description "MIMIC-IV version 1.0" mimic4_v1_0
66+
bq mk --dataset --data_location US --description "MIMIC-IV version 2.0 ICU data" mimic4_v2_0_icu
67+
bq mk --dataset --data_location US --description "MIMIC-IV version 2.0 hosp data" mimic4_v2_0_hosp
7268
```
7369

7470
### B) Check the status of the dataset created
7571

7672
```sh
77-
bq show mimic4_v1_0
73+
bq show mimic4_v2_0_hosp
7874
```
7975

8076
---
@@ -101,13 +97,12 @@ BigQuery schemas are provided in this GitHub repository. Download the table sche
10197

10298
## STEP 6: Create tables and load the compressed files
10399

104-
### A) Create a script file (ex: upload_mimic4_v1_0.sh) and copy the code below.
100+
### A) Create a script file (ex: upload_mimic4_v2_0.sh) and copy the code below.
105101

106102
You will need to change the **schema_local_folder** to match the path to the schemas on your local machine.
107103

108104
Note also that the below assumes the following dataset structure:
109105

110-
* <dataset_prefix>_core
111106
* <dataset_prefix>_icu
112107
* <dataset_prefix>_hosp
113108

@@ -118,25 +113,25 @@ If you would like all tables on the same dataset, you should modify the below sc
118113

119114
# Initialize parameters
120115
bucket="mimic-data" # we chose this bucket earlier when uploading data
121-
dataset_prefix="mimic"
122-
schema_local_folder="/home/alistairewj/mimic-iv/v1.0/schemas"
116+
dataset_prefix="mimic4_v2_0"
117+
schema_local_folder="~/mimic-code/mimic-iv/buildmimic/bigquery/schemas"
123118

124119
# Get the list of files in the bucket
125120

126-
for module in core hosp icu;
121+
for module in hosp icu;
127122
do
128-
FILES=$(gsutil ls gs://$bucket/v1.0/$module/*.csv.gz)
123+
FILES=$(gsutil ls gs://$bucket/v2.0/$module/*.csv.gz)
129124

130125
for file in $FILES
131126
do
132127

133-
# Extract the table name from the file path (ex: gs://mimic4_v1_0/ADMISSIONS.csv.gz)
128+
# Extract the table name from the file path (ex: gs://mimic4_v2_0/ADMISSIONS.csv.gz)
134129
base=${file##*/} # remove path
135130
filename=${base%.*} # remove .gz
136131
tablename=${filename%.*} # remove .csv
137132

138133
# Create table and populate it with data from the bucket
139-
echo bq load --allow_quoted_newlines --skip_leading_rows=1 --source_format=CSV --replace ${dataset_prefix}_${module}.$tablename gs://$bucket/v1.0/$module/$tablename.csv.gz $schema_local_folder/$module/$tablename.json
134+
bq load --allow_quoted_newlines --skip_leading_rows=1 --source_format=CSV --replace ${dataset_prefix}_${module}.$tablename gs://$bucket/v2.0/$module/$tablename.csv.gz $schema_local_folder/$module/$tablename.json
140135

141136
# Check for error
142137
if [ $? -eq 0 ];then
@@ -155,7 +150,7 @@ This code will get the list of files in the bucket, and for each file, it will e
155150
### B) Set the CHMOD to allow the file as executable (ex: 755), and execute the script file
156151

157152
```sh
158-
./upload_mimic4_v1_0.sh
153+
./upload_mimic4_v2_0.sh
159154
```
160155

161156
### C) Results of the upload process
@@ -254,7 +249,7 @@ We can test a successful build by running a check query.
254249
255250
```sh
256251
bq query --use_legacy_sql=False 'select CASE WHEN count(*) = 383220 THEN True ELSE
257-
False end AS check from `mimic4_v1_0.patients`'
252+
False end AS check from `mimic4_v2_0.patients`'
258253
```
259254
260255
This verifies we have the expected row count in the patients table. It's further possible to check the row counts of the other tables by comparing to the already existing MIMIC-IV BigQuery dataset available on `physionet-data`.

0 commit comments

Comments
 (0)