Skip to content

Commit dfe26bc

Browse files
authored
feat: Add is_exonic field to genomic breakpoints (#326)
1 parent 9aec17e commit dfe26bc

File tree

14 files changed

+158
-26
lines changed

14 files changed

+158
-26
lines changed

notebooks/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# Notebooks
22

3-
* `fusion_evidence_matching.ipynb`: Demonstrates evidence matching workflow between assayed fusions (i.e. fusions from patient samples) and categorical fusions (i.e. fusions from genomic knowledgebases such as CIViC). The example queried fusions in this notebook are KIF5B::RET and EML4::ALK. Fusions are matched at the gene symbol, transcript accession, exon number, exon offset, and genomic breakpoint for each transcript segment, along with the linker sequence joining the two segments (if present). The highest possible match score is 11 points, and the categorical fusions with the highest score are returned in the matching algorithm along with their associated score.
3+
* `fusion_evidence_matching.ipynb`: Demonstrates evidence matching workflow between assayed fusions (i.e. fusions from patient samples) and categorical fusions (i.e. fusions from genomic knowledgebases such as CIViC). The example queried fusions in this notebook are EML4::ALK and BCR::ABL1. Fusions are matched at the gene symbol, transcript accession, exon number, exon offset, and genomic breakpoint for each transcript segment, along with the linker sequence joining the two segments (if present). Matching fusions are returned according to a prioritization scale that is described [in the FUSOR wiki](https://github.com/cancervariants/fusor/wiki/Fusion-Match-Classes).

notebooks/evidence_matching/fusion_evidence_matching.ipynb

Lines changed: 103 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -179,14 +179,7 @@
179179
"metadata": {},
180180
"source": [
181181
"The output above lists all possible categorical fusions with EML4 and ALK \n",
182-
"as a partner. For the EML4::ALK fusion, we would expect a match for the \n",
183-
"EML4(entrez:27436)::ALK(entrez:238) fusion, as this fusion describes the joining \n",
184-
"of exon 13 of EML4 with exon 20 of ALK, which also describes the assayed fusion. \n",
185-
"Note that the other EML4::ALK categorical fusions indicate the joining of exons \n",
186-
"that do not match the queried assayed fusion, so these would be returned as\n",
187-
"lower priority matches. v::ALK(entrez:238) would also be a match as this \n",
188-
"fusion describes the joining of exon 20 for the ALK transcript which\n",
189-
"matches the assayed fusion."
182+
"as a partner that could be returned from the CIViC knowledgebase. "
190183
]
191184
},
192185
{
@@ -210,19 +203,117 @@
210203
"text": [
211204
"WARNING:cool_seq_tool.mappers.exon_genomic_coords:48406078 on NC_000023.11 occurs more than 150 bp outside the exon boundaries of the NM_005636.4 transcript, indicating this may not be a chimeric transcript junction and is unlikely to represent a contiguous coding sequence. Confirm that the genomic position 48406078 is being used to represent transcript junction and not DNA breakpoint.\n",
212205
"ERROR:fusor.harvester:Cannot translate fusion: v::PDGFRB(entrez:5159) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
206+
"Traceback (most recent call last):\n",
207+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
208+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
209+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
210+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
211+
" raise ValueError(msg)\n",
212+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
213213
"ERROR:fusor.harvester:Cannot translate fusion: SQSTM1(entrez:8878)::NTRK1(entrez:4914) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
214+
"Traceback (most recent call last):\n",
215+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
216+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
217+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
218+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
219+
" raise ValueError(msg)\n",
220+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
214221
"ERROR:fusor.harvester:Cannot translate fusion: v::NTRK3(entrez:4916) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
222+
"Traceback (most recent call last):\n",
223+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
224+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
225+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
226+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
227+
" raise ValueError(msg)\n",
228+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
215229
"ERROR:fusor.harvester:Cannot translate fusion: v::RET(entrez:5979) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
230+
"Traceback (most recent call last):\n",
231+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
232+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
233+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
234+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
235+
" raise ValueError(msg)\n",
236+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
216237
"ERROR:fusor.harvester:Cannot translate fusion: ATP1B1(entrez:481)::NRG1(entrez:3084) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
238+
"Traceback (most recent call last):\n",
239+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
240+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
241+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
242+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
243+
" raise ValueError(msg)\n",
244+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
217245
"ERROR:fusor.harvester:Cannot translate fusion: SDC4(entrez:6385)::NRG1(entrez:3084) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
246+
"Traceback (most recent call last):\n",
247+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
248+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
249+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
250+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
251+
" raise ValueError(msg)\n",
252+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
218253
"ERROR:fusor.harvester:Cannot translate fusion: FGFR1OP2(entrez:26127)::FGFR1(entrez:2260) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
254+
"Traceback (most recent call last):\n",
255+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
256+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
257+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
258+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
259+
" raise ValueError(msg)\n",
260+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
219261
"ERROR:fusor.harvester:Cannot translate fusion: GOPC(entrez:57120)::ROS1(entrez:6098) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
262+
"Traceback (most recent call last):\n",
263+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
264+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
265+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
266+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
267+
" raise ValueError(msg)\n",
268+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
220269
"ERROR:fusor.harvester:Cannot translate fusion: TPM3(entrez:7170)::NTRK1(entrez:4914) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
270+
"Traceback (most recent call last):\n",
271+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
272+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
273+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
274+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
275+
" raise ValueError(msg)\n",
276+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
221277
"ERROR:fusor.harvester:Cannot translate fusion: RCSD1(entrez:92241)::ABL2(entrez:27) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
278+
"Traceback (most recent call last):\n",
279+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
280+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
281+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
282+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
283+
" raise ValueError(msg)\n",
284+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
222285
"ERROR:fusor.harvester:Cannot translate fusion: v::TFE3(entrez:7030) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
286+
"Traceback (most recent call last):\n",
287+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
288+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
289+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
290+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
291+
" raise ValueError(msg)\n",
292+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
223293
"ERROR:fusor.harvester:Cannot translate fusion: TCF3(entrez:6929)::PBX1(entrez:5087) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
294+
"Traceback (most recent call last):\n",
295+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
296+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
297+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
298+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
299+
" raise ValueError(msg)\n",
300+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
224301
"ERROR:fusor.harvester:Cannot translate fusion: v::NUTM1(entrez:256646) due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
302+
"Traceback (most recent call last):\n",
303+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
304+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
305+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
306+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
307+
" raise ValueError(msg)\n",
308+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
225309
"ERROR:fusor.harvester:Cannot translate fusion: ENST00000275493.7(EGFR):e.24::ENST00000267868.8(RAD51):e.4 due to the following reason: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
310+
"Traceback (most recent call last):\n",
311+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/harvester.py\", line 412, in load_records\n",
312+
" cat_fusion = await self.translator.translate(civic=fusion)\n",
313+
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
314+
" File \"/Users/rsjxa001/fusion_project/fusor/src/fusor/translator.py\", line 984, in translate\n",
315+
" raise ValueError(msg)\n",
316+
"ValueError: Translation cannot proceed as GRCh37 transcripts and exons lacks genomic breakpoints\n",
226317
"WARNING:fusor.harvester:15 fusion(s) were dropped during translation\n",
227318
"WARNING:fusor.models:Cached fusions file already exists. Overwriting with new file\n"
228319
]
@@ -324,6 +415,7 @@
324415
" 'code': 'HGNC:1316'}},\n",
325416
" 'elementGenomicEnd': {'id': 'ga4gh:SL.PQzV-kfeCQ4MBmxD5mSHqZmId3I_f-Ib',\n",
326417
" 'type': 'SequenceLocation',\n",
418+
" 'extensions': [{'name': 'is_exonic', 'value': True}],\n",
327419
" 'digest': 'PQzV-kfeCQ4MBmxD5mSHqZmId3I_f-Ib',\n",
328420
" 'sequenceReference': {'id': 'refseq:NC_000002.12',\n",
329421
" 'type': 'SequenceReference',\n",
@@ -341,6 +433,7 @@
341433
" 'code': 'HGNC:427'}},\n",
342434
" 'elementGenomicStart': {'id': 'ga4gh:SL.Eu_igVd9zOahn3tFN-pyxtphUmrSlRAh',\n",
343435
" 'type': 'SequenceLocation',\n",
436+
" 'extensions': [{'name': 'is_exonic', 'value': True}],\n",
344437
" 'digest': 'Eu_igVd9zOahn3tFN-pyxtphUmrSlRAh',\n",
345438
" 'sequenceReference': {'id': 'refseq:NC_000002.12',\n",
346439
" 'type': 'SequenceReference',\n",
@@ -397,6 +490,7 @@
397490
" 'code': 'HGNC:1316'}},\n",
398491
" 'elementGenomicEnd': {'id': 'ga4gh:SL.-IgV899bqi2cN3ugOPhJG_ZAbuUgrN7N',\n",
399492
" 'type': 'SequenceLocation',\n",
493+
" 'extensions': [{'name': 'is_exonic', 'value': True}],\n",
400494
" 'digest': '-IgV899bqi2cN3ugOPhJG_ZAbuUgrN7N',\n",
401495
" 'sequenceReference': {'id': 'refseq:NC_000002.12',\n",
402496
" 'type': 'SequenceReference',\n",
@@ -414,6 +508,7 @@
414508
" 'code': 'HGNC:427'}},\n",
415509
" 'elementGenomicStart': {'id': 'ga4gh:SL.Eu_igVd9zOahn3tFN-pyxtphUmrSlRAh',\n",
416510
" 'type': 'SequenceLocation',\n",
511+
" 'extensions': [{'name': 'is_exonic', 'value': True}],\n",
417512
" 'digest': 'Eu_igVd9zOahn3tFN-pyxtphUmrSlRAh',\n",
418513
" 'sequenceReference': {'id': 'refseq:NC_000002.12',\n",
419514
" 'type': 'SequenceReference',\n",

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ dependencies = [
3030
"biocommons.seqrepo",
3131
"gene-normalizer ~=0.10.0",
3232
"civicpy ~=5.0",
33-
"cool-seq-tool ~=0.14.0"
33+
"cool-seq-tool ~=0.14.5"
3434
]
3535
dynamic=["version"]
3636

src/fusor/examples/bcr_abl1.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@
2525
"type": "SequenceReference",
2626
"refgetAccession": "SQ.7B7SHsmchAR0dFcDCuSFjJAo7tX87krQ"
2727
},
28-
"end": 23254162
28+
"end": 23254162,
29+
"extensions": [{"name": "is_exonic", "value": true}]
2930
}
3031
},
3132
{
@@ -60,7 +61,8 @@
6061
"type": "SequenceReference",
6162
"refgetAccession": "SQ.KEO-4XBcm1cxeo_DIQ8_ofqGUkp4iZhI"
6263
},
63-
"start": 130853890
64+
"start": 130853890,
65+
"extensions": [{"name": "is_exonic", "value": true}]
6466
}
6567
}
6668
],

src/fusor/examples/bcr_abl1_expanded.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,8 @@
203203
"type": "SequenceReference",
204204
"refgetAccession": "SQ.7B7SHsmchAR0dFcDCuSFjJAo7tX87krQ"
205205
},
206-
"end": 23254162
206+
"end": 23254162,
207+
"extensions": [{"name": "is_exonic", "value": true}]
207208
}
208209
},
209210
{
@@ -427,7 +428,8 @@
427428
"type": "SequenceReference",
428429
"refgetAccession": "SQ.KEO-4XBcm1cxeo_DIQ8_ofqGUkp4iZhI"
429430
},
430-
"start": 130853890
431+
"start": 130853890,
432+
"extensions": [{"name": "is_exonic", "value": true}]
431433
}
432434
}
433435
],

src/fusor/examples/tpm3_itd.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@
2525
"type": "SequenceReference",
2626
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
2727
},
28-
"start": 154170399
28+
"start": 154170399,
29+
"extensions": [{"name": "is_exonic", "value": true}]
2930
}
3031
},
3132
{
@@ -52,7 +53,8 @@
5253
"type": "SequenceReference",
5354
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
5455
},
55-
"start": 154170399
56+
"start": 154170399,
57+
"extensions": [{"name": "is_exonic", "value": true}]
5658
}
5759
}
5860
]

src/fusor/examples/tpm3_ntrk1.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@
2525
"type": "SequenceReference",
2626
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
2727
},
28-
"start": 154170399
28+
"start": 154170399,
29+
"extensions": [{"name": "is_exonic", "value": true}]
2930
}
3031
},
3132
{
@@ -52,7 +53,8 @@
5253
"type": "SequenceReference",
5354
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
5455
},
55-
"start": 156874570
56+
"start": 156874570,
57+
"extensions": [{"name": "is_exonic", "value": true}]
5658
}
5759
}
5860
],

src/fusor/examples/tpm3_pdgfrb.json

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@
2727
"type": "SequenceReference",
2828
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
2929
},
30-
"end": 154192135
30+
"end": 154192135,
31+
"extensions": [{"name": "is_exonic", "value": true}]
3132
},
3233
"elementGenomicEnd": {
3334
"id": "ga4gh:SL.Lnne0bSsgjzmNkKsNnXg98FeJSrDJuLb",
@@ -38,7 +39,8 @@
3839
"type": "SequenceReference",
3940
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO"
4041
},
41-
"start": 154170399
42+
"start": 154170399,
43+
"extensions": [{"name": "is_exonic", "value": true}]
4244
}
4345
},
4446
{
@@ -67,7 +69,8 @@
6769
"type": "SequenceReference",
6870
"refgetAccession": "SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI"
6971
},
70-
"end": 150126614
72+
"end": 150126614,
73+
"extensions": [{"name": "is_exonic", "value": true}]
7174
},
7275
"elementGenomicEnd": {
7376
"id": "ga4gh:SL.YJ7cXnaG52dPiu7uR60yUneqHaKHEkLP",
@@ -78,7 +81,8 @@
7881
"type": "SequenceReference",
7982
"refgetAccession": "SQ.aUiQCzCPZ2d0csHbMSbh2NzInhonSXwI"
8083
},
81-
"start": 150117617
84+
"start": 150117617,
85+
"extensions": [{"name": "is_exonic", "value": true}]
8286
}
8387
}
8488
],

0 commit comments

Comments
 (0)