|
| 1 | +# CWL Viewer MongoDB migration to PostgreSQL |
| 2 | + |
| 3 | +This directory holds documentation that may be useful for other CWL Viewer |
| 4 | +developers (probably not for users.) |
| 5 | + |
| 6 | +## Spring Boot & Hibernate upgrade |
| 7 | + |
| 8 | +Issue: <https://github.com/common-workflow-language/cwlviewer/issues/254> |
| 9 | + |
| 10 | +## Jupyter Notebook, `mongo_to_pg.ipynb` |
| 11 | + |
| 12 | +Issue: <https://github.com/common-workflow-language/cwlviewer/issues/395> |
| 13 | + |
| 14 | +After upgrading the Spring Boot and Hibernate code, we were left with the |
| 15 | +production MongoDB database in AWS. Even though we could create a new |
| 16 | +environment with PostgreSQL, we still needed to migrate the production |
| 17 | +database. |
| 18 | + |
| 19 | +The first step to work on the issue was to obtain a sample of the data |
| 20 | +exported from the system. Instead of relying on someone being able to |
| 21 | +access MongoDB in AWS, the first try was via the existing `dump.sh` script |
| 22 | +(`load.py` is much slower.) |
| 23 | + |
| 24 | +With the data on disk, a Jupyter Notebook was created to process the data |
| 25 | +with Pandas, and produce a CSV to be `COPY`ed by PostgreSQL, importing |
| 26 | +the data directly into the `workflow` table (the only other table in |
| 27 | +CWL Viewer is `queued_workflow`, but it only holds data for workflows being |
| 28 | +processed, can probably be ignored.) |
| 29 | + |
| 30 | +The code of the Notebook is in this directory, and can be used to review |
| 31 | +what was done, or to modify it for other use cases. The Python script with |
| 32 | +the same name (but `.py` extension) was created based on the Notebook, to |
| 33 | +be used in the command line. |
| 34 | + |
| 35 | +To install the dependencies, use `pip install -r requirements.txt |
| 36 | +in a virtual environment to get the dependencies to run both the |
| 37 | +Notebook and the Python script. |
0 commit comments