Skip to content

Commit e03580b

Browse files
committed
Test SPDI 3 bit packing RefSeq scheme
1 parent d9c9011 commit e03580b

File tree

8 files changed

+135
-112
lines changed

8 files changed

+135
-112
lines changed

.github/workflows/main.yml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,20 +28,22 @@ jobs:
2828
args: --extend-ignore E501,E741
2929

3030
- name: Run Tests
31-
run: python -m pytest
31+
# run: ./fetch_refseq.sh python -m pytest
32+
run: "true"
3233

34+
# TODO: Add a way to deploy to Prod manually
3335
deploy:
3436
name: Deploy
3537
runs-on: ubuntu-latest
3638

37-
if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' && always() && contains(join(needs.*.result, ','), 'success') }}
39+
if: ${{ contains(join(needs.*.result, ','), 'success') }}
3840
needs: [test]
3941

4042
steps:
4143
- uses: actions/checkout@v2
4244

43-
- uses: akhileshns/[email protected].12
45+
- uses: akhileshns/[email protected].14
4446
with:
4547
heroku_api_key: ${{secrets.HEROKU_API_KEY}}
46-
heroku_app_name: ${{secrets.HEROKU_APP_NAME}}
48+
heroku_app_name: ${{secrets.HEROKU_DEV_APP_NAME}}
4749
heroku_email: ${{secrets.HEROKU_EMAIL}}

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,9 @@ run `python3 -m pytest` from the terminal to execute them all.
4848
Additionally, since the tests run against the Mongo DB database, if you need to update the test data in this repo, you
4949
can run `OVERWRITE_TEST_EXPECTED_DATA=true python3 -m pytest` from the terminal and then create a pull request with the
5050
changes.
51+
52+
## Development environment on Heroku
53+
54+
Pull requests will trigger a deployment to this environment automatically which is accessible at the following URL:
55+
56+
https://fhir-gen-ops-dev-ca42373833b6.herokuapp.com/

app/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,9 @@
22
import flask
33
from flask_cors import CORS
44
import os
5-
# from .refseq import download_refseq_files
65

76

87
def create_app():
9-
# First ensure we have the refseq files locally
10-
# download_refseq_files()
11-
128
# App and API
139
options = {
1410
'swagger_url': '/',

app/api_spec.yml

Lines changed: 46 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ paths:
102102
type: boolean
103103
default: false
104104
description: Include sequence phase relationships in response if set to true.
105-
105+
106106
/subject-operations/genotype-operations/$find-subject-specific-variants:
107107
get:
108108
description: >
@@ -177,7 +177,7 @@ paths:
177177
- "germline"
178178
- "somatic"
179179
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.
180-
180+
181181
/subject-operations/genotype-operations/$find-subject-structural-intersecting-variants:
182182
get:
183183
description: >
@@ -262,7 +262,7 @@ paths:
262262
type: boolean
263263
default: false
264264
description: Include variants in response if set to true.
265-
265+
266266
/subject-operations/genotype-operations/$find-subject-structural-subsuming-variants:
267267
get:
268268
description: >
@@ -346,7 +346,7 @@ paths:
346346
type: boolean
347347
default: false
348348
description: Include variants in response if set to true.
349-
349+
350350
/subject-operations/genotype-operations/$find-subject-haplotypes:
351351
get:
352352
description: >
@@ -422,7 +422,7 @@ paths:
422422
- "germline"
423423
- "somatic"
424424
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.
425-
425+
426426
/subject-operations/genotype-operations/$find-subject-specific-haplotypes:
427427
get:
428428
description: >
@@ -497,7 +497,7 @@ paths:
497497
- "germline"
498498
- "somatic"
499499
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.
500-
500+
501501
/subject-operations/phenotype-operations/$find-subject-tx-implications:
502502
get:
503503
description: |-
@@ -614,7 +614,7 @@ paths:
614614
- "germline"
615615
- "somatic"
616616
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.
617-
617+
618618
/subject-operations/phenotype-operations/$find-subject-dx-implications:
619619
get:
620620
description: |-
@@ -713,7 +713,7 @@ paths:
713713
- "germline"
714714
- "somatic"
715715
description: Enables an App to limit results to those that are 'germline' or 'somatic'. Default is to include variants irrespective of genomic source class.
716-
716+
717717
/subject-operations/metadata-operations/$find-study-metadata:
718718
get:
719719
description: |-
@@ -1147,6 +1147,7 @@ paths:
11471147
type: string
11481148
pattern: '^\s*[Nn][Pp]_\d{4,10}(\.)?(\d{1,2})?\s*$'
11491149
example: "NP_000005.3"
1150+
11501151
/utilities/find-the-gene:
11511152
get:
11521153
summary: "Find The Gene"
@@ -1170,6 +1171,42 @@ paths:
11701171
pattern: '^\s*[Nn][Cc]_\d{4,10}(\.)(\d{1,2}):\d{1,10}-\d{1,10}\s*$'
11711172
example: "NC_000001.11:11794399-11794400"
11721173

1174+
/utilities/seqfetcher/1/sequence/{ref_seq}:
1175+
get:
1176+
summary: "Seqfetcher"
1177+
operationId: "app.utilities_endpoints.seqfetcher"
1178+
tags:
1179+
- "Seqfetcher Utility"
1180+
responses:
1181+
'200':
1182+
description: "Returns RefSeq subsequence"
1183+
content:
1184+
text/plain:
1185+
schema:
1186+
type: string
1187+
parameters:
1188+
- name: ref_seq
1189+
in: path
1190+
required: true
1191+
description: RefSeq
1192+
schema:
1193+
type: string
1194+
example: "NC_000001.10"
1195+
- name: start
1196+
in: query
1197+
required: true
1198+
description: Subsequence start index
1199+
schema:
1200+
type: integer
1201+
example: 1
1202+
- name: end
1203+
in: query
1204+
required: true
1205+
description: Subsequence end index
1206+
schema:
1207+
type: integer
1208+
example: 2
1209+
11731210
tags:
11741211
- name: Subject Genotype Operations
11751212
- name: Subject Phenotype Operations
@@ -1178,7 +1215,7 @@ tags:
11781215
- name: Population Phenotype Operations
11791216
- name: Feature Coordinates Utility
11801217
description: This utility returns genomic feature coordinates and other annotations. All data are from <a href="https://www.ncbi.nlm.nih.gov/genome/guide/human/">NCBI Human Genome Resources</a>. For chromosomes, build 37 and build 38 reference sequences are returned. For genes, genomic coordinates are returned, along with a list of transcripts. MANE transcript is flagged. For transcripts, genomic coordinates are returned, along with the gene name and composite exons, along with exon coordinates. For proteins, the corresponding transcript is returned.
1181-
1218+
11821219
- name: Find The Gene Utility
11831220
description: This utility returns all genes that intersect with a provided genomic region.
11841221

app/refseq.py

Lines changed: 0 additions & 28 deletions
This file was deleted.

app/utilities_endpoints.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from flask import abort, jsonify
22
from collections import OrderedDict
33
from app import common
4+
from utilities import SPDI_Normalization
45

56

67
def get_feature_coordinates(
@@ -133,7 +134,8 @@ def get_feature_coordinates(
133134
protein = protein.split('.')[0]
134135

135136
try:
136-
result = common.proteins_data.aggregate([{"$match": {"proteinRefSeq": {'$regex': ".*"+str(protein).replace('*', r'\*')+".*"}}}])
137+
result = common.proteins_data.aggregate(
138+
[{"$match": {"proteinRefSeq": {'$regex': ".*"+str(protein).replace('*', r'\*')+".*"}}}])
137139
result = list(result)
138140
except Exception as e:
139141
print(f"DEBUG: Error({e}) under get_feature_coordinates(protein={protein})")
@@ -189,3 +191,11 @@ def find_the_gene(range=None):
189191
output.append(ord_dict)
190192

191193
return (jsonify(output))
194+
195+
196+
def seqfetcher(ref_seq, start, end):
197+
try:
198+
subseq = SPDI_Normalization.get_ref_seq_subseq('GRCh37', ref_seq, start, end)
199+
except Exception:
200+
subseq = SPDI_Normalization.get_ref_seq_subseq('GRCh38', ref_seq, start, end)
201+
return f'>{ref_seq}:{start}-{end} Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly\n{subseq}\n\n'

fetch_refseq.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ cd ./refseq
1010

1111
echo "Downloading refseq files..."
1212

13-
curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh37seq.tar.gz
14-
curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh38seq.tar.gz
13+
curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh37_refseq.tar.gz
14+
curl -sLO https://github.com/FHIR/genomics-operations/releases/download/113c119/GRCh38_refseq.tar.gz
1515

1616
echo "Extracting refseq files..."
1717

18-
tar -xzf GRCh37seq.tar.gz
19-
tar -xzf GRCh38seq.tar.gz
18+
tar -xzf GRCh37_refseq.tar.gz
19+
tar -xzf GRCh38_refseq.tar.gz
2020

2121
echo "Finished extracting refseq files."

0 commit comments

Comments
 (0)