@@ -42,17 +42,156 @@ The operations return the following status codes:
4242
4343## Testing
4444
45- To run the [ integration tests] ( https://github.com/FHIR/genomics-operations/tree/main/tests ) , you can use the VS Code Testing functionality which should discover them automatically. You can also
46- run ` python3 -m pytest ` from the terminal to execute them all.
45+ For local development, you will have to create a ` secrets.env ` file in the root of the repo and add in it the MongoDB
46+ password and the UTA Postgres database connection string (see the UTA section below for details):
47+
48+ ```
49+ MONGODB_READONLY_PASSWORD=...
50+ UTA_DATABASE_URL=...
51+ ```
52+
53+ Then, you will need to run ` fetch_utilities_data.sh ` in a terminal to fetch the required data files:
54+
55+ ``` shell
56+ $ ./fetch_utilities_data.sh
57+ ```
58+
59+ To run the [ integration tests] ( https://github.com/FHIR/genomics-operations/tree/main/tests ) , you can use the VS Code
60+ Testing functionality which should discover them automatically. You can also run ` python3 -m pytest ` from the terminal
61+ to execute them all.
4762
4863Additionally, since the tests run against the Mongo DB database, if you need to update the test data in this repo, you
4964can run ` OVERWRITE_TEST_EXPECTED_DATA=true python3 -m pytest ` from the terminal and then create a pull request with the
5065changes.
5166
52- ## Update py-ard database
67+ ## Heroku Deployment
68+
69+ Currently, there are two environments running in Heroku:
70+ - Dev: < https://fhir-gen-ops-dev-ca42373833b6.herokuapp.com/ >
71+ - Prod: < https://fhir-gen-ops.herokuapp.com/ >
72+
73+ Pull requests will trigger a deployment to the dev environment automatically after being merged.
74+
75+ The [ "Manual Deployment"] ( https://github.com/FHIR/genomics-operations/actions/workflows/manual_deployment.yml ) workflow
76+ can be used to deploy code to either the ` dev ` or ` prod ` environments. To do so, please select "Run workflow", ignore
77+ the "Use workflow from" dropdown which lists the branches in the current repo (I can't disable / remove it) and then
78+ select the environment, the branch and the repository. By default, the ` https://github.com/FHIR/genomics-operations `
79+ repo is specified, but you can replace it with any any fork.
80+
81+ Deployments to the prod environment can only be triggered manually from the ` main ` branch of the repo using the Manual
82+ Deployment.
83+
84+ ### Heroku Stack
85+
86+ Make sure that the Python version under [ ` runtime.txt ` ] ( ./runtime.txt ) is
87+ [ supported] ( https://devcenter.heroku.com/articles/python-support#supported-runtimes ) by the
88+ [ Heroku stack] ( https://devcenter.heroku.com/articles/stack ) that is currently running in each environment.
89+
90+ ## UTA Database
91+
92+ The Biocommons [ hgvs] ( https://github.com/biocommons/hgvs ) library which is used for variant parsing, validation and
93+ normalisation requires access to a copy of the [ UTA] ( https://github.com/biocommons/uta ) Postgres database.
94+
95+ We have provisioned a Heroku Postgres instance in the Prod environment which contains the imported data from a database
96+ dump as described [ here] ( https://github.com/biocommons/uta#installing-from-database-dumps ) .
97+
98+ We define a ` UTA_DATABASE_SCHEMA ` environment variable in the [ ` .env ` ] ( .env ) file which contains the name of the
99+ currently imported database schema.
100+
101+ ### Database import procedure (it will take about 30 minutes):
102+
103+ - Go to the UTA dump download site (http://dl.biocommons.org/uta/ ) and get the latest ` <UTA_SCHEMA>.pgd.gz ` file.
104+ - Go to https://dashboard.heroku.com/apps/fhir-gen-ops/resources and click on the "Heroku Postgres" instance (it will
105+ open a new window)
106+ - Go to the Settings tab
107+ - Click "View Credentials"
108+ - Use the fields from this window to fill in the variables below
109+
110+ ``` shell
111+ $ POSTGRES_HOST=" <Heroku Postgres Host>"
112+ $ POSTGRES_DATABASE=" <Heroku Postgres Database>"
113+ $ POSTGRES_USER=" <Heroku Postgres User>"
114+ $ PGPASSWORD=" <Heroku Postgres Password>"
115+ $ UTA_SCHEMA=" <UTA Schema>" # Specify the UTA schema of the UTA dump you downloaded (example: uta_20240523b)
116+ $ gzip -cdq ${UTA_SCHEMA} .pgd.gz | grep -v ' ^GRANT USAGE ON SCHEMA .* TO anonymous;$' | grep -v ' ^ALTER .* OWNER TO uta_admin;$' | psql -U ${POSTGRES_USER} -1 -v ON_ERROR_STOP=1 -d ${POSTGRES_DATABASE} -h ${POSTGRES_HOST} -Eae
117+ ```
118+
119+ Note: The ` grep -v ` commands are required because the Heroku Postgres instance doesn't allow us to create a new role.
120+
121+ Once complete, make sure you update the ` UTA_DATABASE_SCHEMA ` environment variable in the [ ` .env ` ] ( .env ) file and commit
122+ it.
123+
124+ ### Connection string
125+
126+ The connection string for this database can be found in the same Heroku Postgres Settings tab under "View Credentials".
127+ It is pre-populated in the Heroku runtime under the ` UTA_DATABASE_URL ` environment variable. Additionally, we set the
128+ same ` UTA_DATABASE_URL ` environment variable in GitHub so the CI can can use this database when running the tests.
129+
130+ For local development, set ` UTA_DATABASE_URL ` to the Heroku Postgres connection string in the ` secrets.env ` file.
131+ Alternatively, you can set it to
` postgresql://anonymous:[email protected] /uta ` if you'd like to use the HGVS
132+ public instance.
133+
134+ ### Testing the database
135+
136+ ``` shell
137+ $ source secrets.env
138+ $ pgcli " ${UTA_DATABASE_URL} "
139+ > set schema ' <UTA Schema>' ; # Specify the UTA schema of the UTA dump you downloaded (example: uta_20240523b)
140+ > select count(*) from alembic_version
141+ union select count(*) from associated_accessions
142+ union select count(*) from exon
143+ union select count(*) from exon_aln
144+ union select count(*) from exon_set
145+ union select count(*) from gene
146+ union select count(*) from meta
147+ union select count(*) from origin
148+ union select count(*) from seq
149+ union select count(*) from seq_anno
150+ union select count(*) from transcript
151+ union select count(*) from translation_exception;
152+ ` ` `
153+
154+ # ## Update utilities data
155+
156+ The RefSeq metadata from the UTA database needs to be in sync with the RefSeq data which is available for the Seqfetcher
157+ Utility endpoint. Currently, this is stored in GitHub as release artifacts. Similarly, the PyARD SQLite database is also
158+ stored as a release artifact.
159+
160+ To update the RefSeq data and PyARD database, you will have to run ` ./utilities/pack_seqrepo_data.py` . Here is a
161+ step-by-step guide on how to do this:
162+
163+ ` ` ` shell
164+ $ mkdir seqrepo
165+ $ cd seqrepo
166+ $ python3 -m venv .venv
167+ $ . .venv/bin/activate
168+ $ pip install setuptools==75.7.0
169+ $ pip install biocommons.seqrepo==0.6.9
170+ $ # See https://github.com/biocommons/biocommons.seqrepo/issues/171 for a bug that's causing issues with the builtin
171+ $ # rsync on OSX.
172+ # # This OSX-specific. Guess the standard package managers have it available on Linux.
173+ $ brew install rsync
174+ $ # Fetch seqrepo data (should take about 16 minutes)
175+ $ seqrepo --rsync-exe /opt/homebrew/bin/rsync -r . pull --update-latest
176+ $ # If you'll get a "Permission denied" error, then you can run the following command (using the temp directory which
177+ $ # got created):
178+ $ # > chmod +w 2024-02-20.r4521u5y && mv 2024-02-20.r4521u5y 2024-02-20 && ln -s 2024-02-20 latest
179+ $
180+ $ # Exit venv and cd to genomics-operations repo.
181+ $
182+ $ # Pack the utilities data (should take about 25 minutes)
183+ $ python ./utilities/pack_utilities_data.py
184+ ` ` `
185+ You should see a warning in the output log if the current ` PYARD_DATABASE_VERSION` is outdated and you can change
186+ ` PYARD_DATABASE_VERSION` in the ` .env` file if you wish to switch to the latest version that is printed in this log.
187+
188+ Now you should set a new value for `UTILITIES_DATA_VERSION` in the ` .env` file, create a new branch and commit this
189+ change in it. Then also create a git tag for this commit with the ` UTILITIES_DATA_VERSION` value and push it to GitHub
190+ along with the branch. Now you can use this tag to create a new [release](https://github.com/FHIR/genomics-operations/releases).
191+ Inside this release, you need to attach all the ` * .tar.gz` files from the ` ./tmp` folder which was created after
192+ ` pack_utilities_data.py` ran successfully.
193+
194+ Once the release is published, create PR from this new branch and merge it.
53195
54- - Run ` pyard.init(data_dir='./data/pyard', imgt_version=<new version>) ` to download the new version
55- - Run ` cd data/pyard && tar -czf pyard.sqlite3.tar.gz pyard-<new version>.sqlite3 `
56- - Upload ` pyard.sqlite3.tar.gz ` in a new release on GitHub
57- - Update ` PYARD_DATABASE_VERSION ` in ` .env `
58- - Update ` UTILITIES_DATA_VERSION ` in ` .env ` with the new tag ID (short git sha)
196+ Finally, in order to validate the new release locally, run ` fetch_utilities_data.sh` locally to recreate the ` data`
197+ directory (delete it first if you have it already).
0 commit comments