Skip to content

Commit 0f89822

Browse files
authored
Merge pull request #1455 from MIT-LCP/bm/m4-sqlite-no-sqlalchemy
Avoid unnecessary dependency on sqlalchemy
2 parents 1ff562b + 1825ff3 commit 0f89822

File tree

3 files changed

+58
-20
lines changed

3 files changed

+58
-20
lines changed

.github/workflows/sqlite.yml

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
name: sqlite demo db build
2+
on:
3+
pull_request_review:
4+
types: [submitted]
5+
6+
jobs:
7+
mimic-iv-sqlite:
8+
# only run if PR is approved
9+
if: github.event.review.state == 'approved'
10+
runs-on: ubuntu-20.04
11+
12+
steps:
13+
- name: Check out repository code
14+
uses: actions/checkout@v3
15+
16+
- name: Set up Python
17+
uses: actions/setup-python@v4
18+
with:
19+
python-version: '3.10'
20+
21+
- name: Python dependencies
22+
run: |
23+
pip install pandas
24+
25+
- name: Download demo data
26+
uses: ./.github/actions/download-demo
27+
with:
28+
gcp-project-id: ${{ secrets.GCP_PROJECT_ID }}
29+
gcp-sa-key: ${{ secrets.GCP_SA_KEY }}
30+
31+
- name: Load icu/hosp data into SQLite
32+
run: |
33+
echo "Running SQLite build."
34+
python ${BUILDCODE_PATH}/import.py
35+
36+
echo `md5sum mimic4.db`
37+
38+
env:
39+
BUILDCODE_PATH: mimic-iv/buildmimic/sqlite

mimic-iv/buildmimic/sqlite/README.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,7 @@ into memory. It only needs three things to run:
1515
`import.py` is a python script. It requires the following to run:
1616

1717
1. Python 3 installed
18-
2. SQLite
19-
3. [pandas](https://pandas.pydata.org/)
20-
4. [sqlalchemy](https://www.sqlalchemy.org/)
18+
2. [pandas](https://pandas.pydata.org/)
2119

2220
## Step 1: Download the CSV or CSV.GZ files.
2321

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import os
2+
import sqlite3
23
import sys
34

45
from glob import glob
@@ -7,28 +8,28 @@
78
DATABASE_NAME = "mimic4.db"
89
THRESHOLD_SIZE = 5 * 10**7
910
CHUNKSIZE = 10**6
10-
CONNECTION_STRING = "sqlite:///{}".format(DATABASE_NAME)
1111

1212
if os.path.exists(DATABASE_NAME):
1313
msg = "File {} already exists.".format(DATABASE_NAME)
1414
print(msg)
1515
sys.exit()
1616

17-
for f in glob("**/*.csv*", recursive=True):
18-
print("Starting processing {}".format(f))
19-
folder, filename = os.path.split(f)
20-
tablename = filename.lower()
21-
if tablename.endswith('.gz'):
22-
tablename = tablename[:-3]
23-
if tablename.endswith('.csv'):
24-
tablename = tablename[:-4]
25-
if os.path.getsize(f) < THRESHOLD_SIZE:
26-
df = pd.read_csv(f)
27-
df.to_sql(tablename, CONNECTION_STRING)
28-
else:
29-
# If the file is too large, let's do the work in chunks
30-
for chunk in pd.read_csv(f, chunksize=CHUNKSIZE, low_memory=False):
31-
chunk.to_sql(tablename, CONNECTION_STRING, if_exists="append")
32-
print("Finished processing {}".format(f))
17+
with sqlite3.Connection(DATABASE_NAME) as connection:
18+
for f in glob("**/*.csv*", recursive=True):
19+
print("Starting processing {}".format(f))
20+
folder, filename = os.path.split(f)
21+
tablename = filename.lower()
22+
if tablename.endswith('.gz'):
23+
tablename = tablename[:-3]
24+
if tablename.endswith('.csv'):
25+
tablename = tablename[:-4]
26+
if os.path.getsize(f) < THRESHOLD_SIZE:
27+
df = pd.read_csv(f)
28+
df.to_sql(tablename, connection)
29+
else:
30+
# If the file is too large, let's do the work in chunks
31+
for chunk in pd.read_csv(f, chunksize=CHUNKSIZE, low_memory=False):
32+
chunk.to_sql(tablename, connection, if_exists="append")
33+
print("Finished processing {}".format(f))
3334

3435
print("Should be all done!")

0 commit comments

Comments
 (0)