Skip to content

Commit 9e8ca76

Browse files
committed
Add mongo_to_pg Python script and notebook
1 parent 9f0d580 commit 9e8ca76

File tree

5 files changed

+1728
-0
lines changed

5 files changed

+1728
-0
lines changed

docs/mongo-to-postgres/.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
*.json
2+
*.csv
3+
.ipynb_checkpoints/
4+
venv/
5+

docs/mongo-to-postgres/README.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# CWL Viewer MongoDB migration to PostgreSQL
2+
3+
This directory holds documentation that may be useful for other CWL Viewer
4+
developers (probably not for users.)
5+
6+
## Spring Boot & Hibernate upgrade
7+
8+
Issue: <https://github.com/common-workflow-language/cwlviewer/issues/254>
9+
10+
## Jupyter Notebook, `mongo_to_pg.ipynb`
11+
12+
Issue: <https://github.com/common-workflow-language/cwlviewer/issues/395>
13+
14+
After upgrading the Spring Boot and Hibernate code, we were left with the
15+
production MongoDB database in AWS. Even though we could create a new
16+
environment with PostgreSQL, we still needed to migrate the production
17+
database.
18+
19+
The first step to work on the issue was to obtain a sample of the data
20+
exported from the system. Instead of relying on someone being able to
21+
access MongoDB in AWS, the first try was via the existing `dump.sh` script
22+
(`load.py` is much slower.)
23+
24+
With the data on disk, a Jupyter Notebook was created to process the data
25+
with Pandas, and produce a CSV to be `COPY`ed by PostgreSQL, importing
26+
the data directly into the `workflow` table (the only other table in
27+
CWL Viewer is `queued_workflow`, but it only holds data for workflows being
28+
processed, can probably be ignored.)
29+
30+
The code of the Notebook is in this directory, and can be used to review
31+
what was done, or to modify it for other use cases. The Python script with
32+
the same name (but `.py` extension) was created based on the Notebook, to
33+
be used in the command line.
34+
35+
To install the dependencies, use `pip install -r requirements.txt
36+
in a virtual environment to get the dependencies to run both the
37+
Notebook and the Python script.

0 commit comments

Comments
 (0)