Skip to content

Commit 1894109

Browse files
committed
migrate getting started content
1 parent 2309fc3 commit 1894109

File tree

11 files changed

+486
-64
lines changed

11 files changed

+486
-64
lines changed

jekyll-site/_docs/gettingstarted.md

Lines changed: 0 additions & 64 deletions
This file was deleted.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
title: "AWS"
3+
layout: default
4+
parent: "Cloud"
5+
grand_parent: "Getting Started"
6+
nav_order: 4
7+
description: "Currently MIMIC-III is available on AWS but MIMIC-IV is not. MIMIC-IV will be released on AWS in the future."
8+
---
9+
10+
# AWS
11+
12+
Recently, the MIT Laboratory of Computational Physiology (LCP) started hosting the MIMIC-III dataset on the AWS cloud through the AWS Public Dataset program. You can now use the MIMIC-III dataset via S3 without having to download, copy, or pay to store it. Instead, you can analyze the MIMIC-III dataset in the AWS Cloud using AWS services like Amazon EC2, Athena, AWS Lambda, or Amazon EMR. AWS Cloud availability enables quicker and cheaper research into the dataset.
13+
14+
Services like Athena also offer you new analytical approaches to the MIMIC-III dataset. Using Athena, you can execute standard SQL queries against MIMIC-III without first loading the data into a database. Because you can reference the MIMIC-III dataset hosted by MIT LCP in Amazon S3, your analyses always reference the most recent version of the MIMIC-III dataset. Live hosting reduces upfront time and effort, eliminates data synchronization issues, improves data analysis, and reduces overall study costs.
15+
16+
Once you have successfully requested access to MIMIC-III on AWS, you can follow the instructions linked below. These instructions initialize and execute an entire study performed on MIMIC-III using a hosted Jupyter notebook service on AWS.
17+
18+
https://aws.amazon.com/blogs/big-data/perform-biomedical-informatics-without-a-database-using-mimic-iii-data-and-amazon-athena/
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: "BigQuery"
3+
layout: default
4+
parent: "Cloud"
5+
grand_parent: "Getting Started"
6+
nav_order: 3
7+
description: "Access MIMIC IV on BigQuery."
8+
---
9+
10+
# BigQuery
11+
12+
BigQuery is a columnar, distributed relational database management system. BigQuery accesses only the columns specified in the query, making it ideal for data analysis workflows. [Read more about BigQuery in Google's cloud documentation](https://cloud.google.com/bigquery/).
13+
14+
First, ensure you have been provisioned access to MIMIC III or IV on BigQuery. See the [cloud page for instructions](../../cloud). Once you have been provisioned access to using MIMIC on BigQuery, it's worthwhile to "pin" the dataset to see it on the BigQuery web tool.
15+
16+
1. Go to the BigQuery console: http://console.cloud.google.com/bigquery
17+
2. If you haven't created a BigQuery project previously you will be asked to do so. You will need to enter information to pay for the cost of queries. For more details see: https://cloud.google.com/resource-manager/docs/creating-managing-projects
18+
3. On the left sidebar, next to "Explorer", click "+ ADD DATA". Click "Star a project by name".
19+
4. Type `physionet-data` and enter it in.
20+
5. In the sidebar on the left, you should now see the `physionet-data` project. Click the arrow to the left of `physionet-data` to expand the project.
21+
6. You should now see a number of datasets. Which datasets you see depends on the access provisioned to you.
22+
* At a minimum, you will see the demo projects: `eicu_crd_demo` and `mimiciii_demo`.
23+
* If you have successfully requested access to MIMIC-III, you will additionally see `mimiciii_clinical`, `mimiciii_demo`, `mimiciii_notes`, and `mimiciii_derived`.
24+
* If you have successfully requested access to MIMIC-IV, you will additionally see `mimiciv_icu` and `mimiciv_hosp`.
25+
* If you have successfully requested access to MIMIC-IV-ED, you will additionally see `mimiciv_ed`.
26+
* If you have successfully requested access to MIMIC-IV-Note, you will additionally see `mimiciv_note`.
27+
28+
You are now ready to query the data! Try a simple query in the main dialogue box, while logged in under your project that pays for queries.
29+
30+
```sql
31+
SELECT *
32+
FROM `physionet-data.mimiciv_hosp.patients`
33+
WHERE subject_id < 10000100
34+
ORDER BY subject_id
35+
```
36+
37+
The query should return some data, and your browser window should be similar to the below:
38+
39+
![Example output for the query](/img/cloud/bq/example_query.png)
40+
41+
At this point you are ready to use MIMIC on BigQuery!
42+
43+
A tutorial on using BigQuery to query MIMIC-III is available [here](/docs/iii/tutorials/intro-to-mimic-iii-bq).
44+
45+
Note that we have a number of pre-generated "views" of the data. These are available in the `mimiciv_derived` dataset which you are free to query. All code used to generate these views has been made openly available on the [MIMIC-IV code repository](https://github.com/MIT-LCP/mimic-iv/).
46+
47+
If you are having issues, see the [Troubleshooting section](#troubleshooting).
48+
49+
## Troubleshooting
50+
51+
### I get a pop-up about Terms of Service
52+
53+
![Agree to the terms of service](/img/cloud/bq/agree_tos.png)
54+
55+
You will need to agree to all GCP Terms of Service and adhere to their terms in order to use the data on BigQuery.
56+
57+
### When I go to BigQuery, it asks me to create a project
58+
59+
![Create a project on GCP](/img/cloud/bq/create_project.png)
60+
61+
Almost all of your interactions with GCP are associated with a *project*. Importantly, all billing for your usage must be allotted to a single project.
62+
In order to use BigQuery you must have an activate project associated with your account. BigQuery offers a $300 free trial for first time users.
63+
64+
Create a project and select it as your activate project. If you've done this correctly, then the top bar of the Google console page should stop saying "Select a project", and instead have your project name. For example, in the below, I have selected the project `alistairewj`, which is now the activate project:
65+
66+
![Example of a working activate project](/img/cloud/bq/active_project.png)
67+
68+
69+
### I can only see `eicu_crd_demo` and `mimiciii_demo`
70+
71+
These datasets are fully public, so the implication is that you have not been granted access to the full versions of the databases.
72+
Please (1) double check you have entered your cloud information into your PhysioNet profile, verifying any e-mails as needed, and (2) requested access to the specific cloud project on its respective PhysioNet project page.
73+
74+
### I want to ask a question about MIMIC or raise an issue on the MIMIC Code Repository
75+
76+
If none of the above have the answer, feel free to [raise an issue](https://github.com/MIT-LCP/mimic-code/issues) or [ask for advice](https://github.com/MIT-LCP/mimic-code/discussions) in the MIMIC repository.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: "Google Cloud Storage"
3+
layout: default
4+
parent: "Cloud"
5+
grand_parent: "Getting Started"
6+
nav_order: 5
7+
description: "Access the MIMIC datasets on Google Cloud Storage (GCS)."
8+
---
9+
10+
# Google Cloud Storage
11+
12+
The MIMIC databases are hosted on Google Cloud Storage (GCS). Access to the bucket, if available, can be requested via the PhysioNet project page. See [this page for more detail on this process](/docs/gettingstarted).
13+
14+
Datasets available on GCS include:
15+
16+
- [MIMIC-III](https://physionet.org/content/mimiciii/)
17+
- [MIMIC-IV](https://physionet.org/content/mimiciv/)
18+
- [MIMIC-IV-CXR](https://physionet.org/content/mimic-cxr/)\*\*
19+
- [MIMIC-IV-ED](https://physionet.org/content/mimic-iv-ed)
20+
21+
22+
\*\* MIMIC-IV-CXR is over 4.7 TB, almost entirely due to the size of the DICOMs.
23+
Users should strongly consider *not* downloading the data, and instead using it within Google Cloud Platform (GCP), which we support natively.
24+
GCP does not charge for data transfer within a region in GCP ([see this page for more details about network charges.](https://cloud.google.com/storage/pricing#network-pricing)).
25+
26+
## Downloading a dataset using `gsutil`
27+
28+
`gsutil` is a command-line tool for interacting with object stores.
29+
You'll need to install `gsutil` locally and authenticate with the same Google account you have linked to your PhysioNet account.
30+
31+
See their instruction page for details on the install and configuration process: https://cloud.google.com/storage/docs/quickstart-gsutil
32+
33+
Once you have `gsutil` installed and authenticated, you can download a dataset using the `gsutil -m cp` command, where `-m` requests multiprocessing. For example, you can download MIMIC-IV as follows:
34+
35+
```
36+
gsutil -m cp -r gs://mimiciii-1.4.physionet.org ./
37+
```
38+
39+
... which will download all the data (~7 GB) in the MIMIC-III project to your local folder.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: "Cloud"
3+
layout: default
4+
parent: "Getting Started"
5+
has_children: true
6+
nav_order: 1
7+
description: "Access MIMIC on the Cloud"
8+
---
9+
10+
# Cloud
11+
12+
MIMIC-III and MIMIC-IV are made available via Physionet: [MIMIC-III](https://physionet.org/content/mimiciii/), [MIMIC-IV](https://physionet.org/content/mimiciv/). Beyond directly downloading the dataset from PhysioNet, there are a few mechanisms for accessing the data:
13+
14+
* Accessing the data in BigQuery
15+
* Accessing the data on AWS (MIMIC-III)
16+
* Accessing the data in a Google Cloud storage bucket (MIMIC-IV)
17+
18+
We **highly** recommend using MIMIC in BigQuery for the following reasons:
19+
20+
* No setup required
21+
* Updates will be integrated into BigQuery when they are available
22+
* Derived concepts from the [MIMIC code repository](https://www.github.com/MIT-LCP/mimic-code) are precomputed and available on the `mimic_derived` dataset
23+
24+
If absolutely necessary, it is possible to download the data from a Google cloud bucket; instructions are provided at the end of the document.
25+
Please do keep in mind that the PhysioNet team is covering the cost of downloading the dataset.
26+
27+
## Accessing data on the cloud
28+
29+
There are three steps to accessing data on the cloud:
30+
31+
1. Link your cloud account to your PhysioNet profile
32+
2. Request access to the cloud resource
33+
3. Log-in to the appropriate service and navigate to the resource
34+
35+
These steps assume you are already credentialed and have signed the data use agreement for MIMIC-IV.
36+
If you have not, read this page for instructions on gaining access to [MIMIC](/docs/gettingstarted).
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
title: "Linking your cloud account"
3+
layout: default
4+
parent: "Cloud"
5+
grand_parent: "Getting Started"
6+
nav_order: 1
7+
description: "How to connect your PhysioNet account to a cloud account."
8+
---
9+
10+
# Linking your cloud account
11+
12+
MIMIC is available for use via two cloud platforms: Google Cloud Platform (GCP) and Amazon Web Services (AWS). Access to these services is directly controlled via your PhysioNet account.
13+
14+
We will assume you are a credentialed user on PhysioNet and have signed the MIMIC data use agreement.
15+
16+
## Adding cloud credentials
17+
18+
Go to your PhysioNet profile page.
19+
20+
![Profile page on PhysioNet](/img/cloud/profile.png)
21+
22+
Click "Emails":
23+
24+
![Navigate to the e-mails page](/img/cloud/emails.png)
25+
26+
For GCP access, ensure that one of your e-mails is a Google account. This can either be a gmail account (as in the picture), or a G Suite account if your organization is a member of G Suite. You can add an e-mail at the bottom of the page:
27+
28+
![Navigate to the Cloud page](/img/cloud/add_email.png)
29+
30+
You will need to verify your e-mail address before continuing (note: e-mail addresses are only used for GCP access, and not for AWS access).
31+
32+
Once you have a verified e-mail address ready, navigate to the "Cloud" page on PhysioNet.
33+
34+
![Navigate to the Cloud page](/img/cloud/cloud_page.png)
35+
36+
You should see two options on this page: one for GCP, and one for AWS.
37+
38+
![Profile cloud credentials](/img/cloud/credentials.png)
39+
40+
For GCP, click the drop down menu and set your GCP e-mail to the Google account you provided in the earlier step.
41+
42+
For AWS, add your AWS canonical ID. This is *not your e-mail*. It is a numeric identifier that can be found in your AWS cloud profile. [Click here to go to your AWS profile page](https://console.aws.amazon.com/billing/home?#/account). Then look for your "Account Id":
43+
44+
![AWS ID](/img/cloud/aws/aws_id.png)
45+
46+
## Linked account
47+
48+
Once you have successfully linked a cloud account to your PhysioNet profile, the next step is to request access to [MIMIC](/docs/gettingstarted/cloud/request) for your cloud account.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: "Accessing MIMIC-IV on the cloud"
3+
layout: default
4+
parent: "Cloud"
5+
grand_parent: "Getting Started"
6+
nav_order: 2
7+
description: "How to grant your linked cloud account access to MIMIC."
8+
---
9+
10+
# Accessing MIMIC-IV on the cloud
11+
12+
Now that your cloud credentials are available in PhysioNet, you can request access to databases within those cloud systems.
13+
Cloud access to PhysioNet projects such as MIMIC-IV and MIMIC-III are managed independently. You must request access to the cloud systems via their project pages (access is provisioned instantly for credentialed users who have signed the DUA).
14+
15+
For MIMIC-III, go to the [MIMIC-III PhysioNet project page](https://physionet.org/content/mimiciii).
16+
For MIMIC-IV, go to the [MIMIC-IV PhysioNet project page](https://physionet.org/content/mimiciv).
17+
18+
Once there, scroll to the bottom to the "Files" section.
19+
*If* the page shows a restricted-access warning, you need to get access to [MIMIC](/docs/gettingstarted).
20+
Otherwise, you should see the following:
21+
22+
![Methods for accessing MIMIC-IV](/img/cloud/mimiciv_files.png)
23+
24+
The following describes the access options listed above in the order they are listed:
25+
26+
1. Download the ZIP file
27+
* This downloads the data directly from the PhysioNet servers.
28+
2. Request access using Google BigQuery (**Cloud**)
29+
* This option adds the Google e-mail in your PhysioNet account to a BigQuery access list
30+
* This is required in order to use the data in BigQuery.
31+
3. Adds your AWS account ID to the access list for AWS (currently only available for MIMIC-III)(**Cloud**).
32+
* This is necessary in order to access the data via AWS services. For information on how to use AWS, we [recommend reading this tutorial](https://aws.amazon.com/blogs/big-data/perform-biomedical-informatics-without-a-database-using-mimic-iii-data-and-amazon-athena/).
33+
4. Request access to the files using Google Cloud Storage Browser (only available for MIMIC-IV)(**Cloud**)
34+
* This option adds your Google e-mail in your PhysioNet account to the GCP access list
35+
* This is required in order to download the data from a storage bucket on GCP.
36+
5. Download the files using your terminal
37+
* Provides a command for downloading the data from PhysioNet as individual CSV files using `wget` (when compared to the image above, your command will have a distinct username).
38+
39+
For example, if you are interested in accessing MIMIC-IV on BigQuery, you would click "Request access using Google BigQuery". The page should provide you a green notification indicating you have been provided access.
40+
41+
![Access granted to Google Cloud Platform's BigQuery service](/img/cloud/bq_provisioned.png)
42+
43+
You will receive an e-mail detailing instructions for how to access MIMIC on BigQuery. Alternatively, instructions are also provided on the [BigQuery page](../bigquery).
44+
45+
## Using data on the cloud
46+
47+
Once you have been granted access to a cloud resource, the next step is to navigate to that resource in the cloud.
48+
49+
* For the GCP Storage Bucket, click the link e-mailed to you.
50+
* For BigQuery, see the [BigQuery page](../bigquery). You may also be interested in the [querying tutorial on BigQuery] for [MIMIC-III](/docs/iii/tutorials/intro-to-mimic-iii-bq//) or [MIMIC-IV](/docs/iv/tutorials/bigquery/).
51+
* AWS access is currently unavailable for MIMIC-IV, but planned.
52+
53+
Once you have access to MIMIC, we highly recommend you read the respective database introduction: [MIMIC-III](/docs/iii), [MIMIC-IV](/docs/iv/).

0 commit comments

Comments
 (0)