-
Notifications
You must be signed in to change notification settings - Fork 0
feat!: add starting_assembly as argument when using genomic_to_tx_segment #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
2f36fdb
Add work to support residue coords + refactor offset calculation
jarbesfeld 49ef3df
Fix docstrings
jarbesfeld fd0e88a
Merge branch 'main' into issue-383
jarbesfeld 99beca0
Refactor genomic_to_tx_segment
jarbesfeld 310a16c
Merge branch 'main' into issue-383
jarbesfeld fc54151
Add translate_identifier
jarbesfeld 6774927
Fix liftover issues
jarbesfeld 9bc2c31
Store requested changes
jarbesfeld 3a7c914
Add support for providing assembly
jarbesfeld 6bad2f9
Merge branch 'main' into issue-390
jarbesfeld 4ab9fa0
Merge branch 'main' into issue-390
jarbesfeld 18f74d9
Add more comments
jarbesfeld 232673d
Correct docstring in _validate_gene_coordinates
jarbesfeld 1c03cac
Update assembly parameter, add chromosome as an optional parameter
jarbesfeld 1aff449
Resolve merge conflicts
jarbesfeld 4b336ba
Ensure merge conflicts are resolved
jarbesfeld 44e1f34
Add docstring, change liftover data access
jarbesfeld 0734fbd
Remove duplicate error statement
jarbesfeld 39b4a75
Update logic for genomic accession validation
jarbesfeld 06244a9
Update docstring
jarbesfeld 2f6b224
Indicate that accession version will be ignored
jarbesfeld File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -413,6 +413,7 @@ async def genomic_to_tx_segment( | |
| transcript: str | None = None, | ||
| gene: str | None = None, | ||
| coordinate_type: CoordinateType = CoordinateType.INTER_RESIDUE, | ||
| starting_assembly: Assembly = Assembly.GRCH38, | ||
| ) -> GenomicTxSegService: | ||
| """Get transcript segment data for genomic data, lifted over to GRCh38. | ||
|
|
||
|
|
@@ -441,7 +442,9 @@ async def genomic_to_tx_segment( | |
| used. | ||
| :param genomic_ac: Genomic accession (i.e. ``NC_000001.11``). If not provided, | ||
| must provide ``chromosome. If ``chromosome`` is also provided, | ||
| ``genomic_ac`` will be used. | ||
| ``genomic_ac`` will be used. If the genomic accession is from GRCh37, it | ||
| will be lifted over to GRCh38 and the original accession version will be | ||
| ignored | ||
| :param seg_start_genomic: Genomic position where the transcript segment starts | ||
| :param seg_end_genomic: Genomic position where the transcript segment ends | ||
| :param transcript: The transcript to use. If this is not given, we will try the | ||
|
|
@@ -452,6 +455,8 @@ async def genomic_to_tx_segment( | |
| value is provided. | ||
| :param coordinate_type: Coordinate type for ``seg_start_genomic`` and | ||
| ``seg_end_genomic``. Expects inter-residue coordinates by default | ||
| :param starting_assembly: The assembly that the supplied coordinate comes from. Set to | ||
| GRCh38 by default. Will attempt to liftover if starting assembly is GRCh37 | ||
| :return: Genomic data (inter-residue coordinates) | ||
| """ | ||
| errors = [] | ||
|
|
@@ -477,6 +482,7 @@ async def genomic_to_tx_segment( | |
| gene=gene, | ||
| is_seg_start=True, | ||
| coordinate_type=coordinate_type, | ||
| starting_assembly=starting_assembly, | ||
| ) | ||
| if start_tx_seg_data.errors: | ||
| return _return_service_errors(start_tx_seg_data.errors) | ||
|
|
@@ -497,6 +503,7 @@ async def genomic_to_tx_segment( | |
| gene=gene, | ||
| is_seg_start=False, | ||
| coordinate_type=coordinate_type, | ||
| starting_assembly=starting_assembly, | ||
| ) | ||
| if end_tx_seg_data.errors: | ||
| return _return_service_errors(end_tx_seg_data.errors) | ||
|
|
@@ -727,6 +734,7 @@ async def _genomic_to_tx_segment( | |
| gene: str | None = None, | ||
| is_seg_start: bool = True, | ||
| coordinate_type: CoordinateType = CoordinateType.INTER_RESIDUE, | ||
| starting_assembly: Assembly = Assembly.GRCH38, | ||
| ) -> GenomicTxSeg: | ||
| """Given genomic data, generate a boundary for a transcript segment. | ||
|
|
||
|
|
@@ -743,7 +751,8 @@ async def _genomic_to_tx_segment( | |
| If ``genomic_ac`` is also provided, ``genomic_ac`` will be used. | ||
| :param genomic_ac: Genomic accession (i.e. ``NC_000001.11``). If not provided, | ||
| must provide ``chromosome. If ``chromosome`` is also provided, ``genomic_ac`` | ||
| will be used. | ||
| will be used. If the genomic accession is from GRCh37, it will be lifted | ||
| over to GRCh38 and the original accession version will be ignored | ||
| :param transcript: The transcript to use. If this is not given, we will try the | ||
| following transcripts: MANE Select, MANE Clinical Plus, Longest Remaining | ||
| Compatible Transcript | ||
|
|
@@ -752,6 +761,8 @@ async def _genomic_to_tx_segment( | |
| ``False`` if ``genomic_pos`` is where the transcript segment ends. | ||
| :param coordinate_type: Coordinate type for ``seg_start_genomic`` and | ||
| ``seg_end_genomic``. Expects inter-residue coordinates by default | ||
| :param starting_assembly: The assembly that the supplied coordinate comes from. Set to | ||
| GRCh38 by default. Will attempt to liftover if starting assembly is GRCh37 | ||
| :return: Data for a transcript segment boundary (inter-residue coordinates) | ||
| """ | ||
| params = {key: None for key in GenomicTxSeg.model_fields} | ||
|
|
@@ -770,8 +781,10 @@ async def _genomic_to_tx_segment( | |
| ) | ||
|
|
||
| if genomic_ac: | ||
| genomic_ac_validation = await self.uta_db.validate_genomic_ac(genomic_ac) | ||
| if not genomic_ac_validation: | ||
| grch38_ac = await self.uta_db.get_newest_assembly_ac(genomic_ac) | ||
| if grch38_ac: | ||
| genomic_ac = grch38_ac[0] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it worth adding something in the docstrings (for both private + public method) about version will be ignored for GRCh37? |
||
| else: | ||
| return GenomicTxSeg( | ||
| errors=[f"Genomic accession does not exist in UTA: {genomic_ac}"] | ||
| ) | ||
|
|
@@ -784,13 +797,17 @@ async def _genomic_to_tx_segment( | |
| ) | ||
| genomic_ac = genomic_acs[0] | ||
|
|
||
| # Always liftover to GRCh38 | ||
| genomic_ac, genomic_pos, err_msg = await self._get_grch38_ac_pos( | ||
| genomic_ac, | ||
| genomic_pos, | ||
| ) | ||
| if err_msg: | ||
| return GenomicTxSeg(errors=[err_msg]) | ||
| # Liftover to GRCh38 if the provided assembly is GRCh37 | ||
| if starting_assembly == Assembly.GRCH37: | ||
| genomic_pos = await self._get_grch38_pos( | ||
| genomic_ac, genomic_pos, chromosome=chromosome if chromosome else None | ||
| ) | ||
| if not genomic_pos: | ||
| return GenomicTxSeg( | ||
| errors=[ | ||
| f"Lifting over {genomic_pos} on {genomic_ac} from {Assembly.GRCH37.value} to {Assembly.GRCH38.value} was unsuccessful." | ||
| ] | ||
| ) | ||
|
|
||
| # Select a transcript if not provided | ||
| if not transcript: | ||
|
|
@@ -903,59 +920,28 @@ async def _genomic_to_tx_segment( | |
| ), | ||
| ) | ||
|
|
||
| async def _get_grch38_ac_pos( | ||
| async def _get_grch38_pos( | ||
| self, | ||
| genomic_ac: str, | ||
| genomic_pos: int, | ||
| grch38_ac: str | None = None, | ||
| ) -> tuple[str | None, int | None, str | None]: | ||
| chromosome: str | None = None, | ||
| ) -> int | None: | ||
| """Get GRCh38 genomic representation for accession and position | ||
|
|
||
| :param genomic_ac: RefSeq genomic accession (GRCh37 or GRCh38 assembly) | ||
| :param genomic_pos: Genomic position on ``genomic_ac`` | ||
| :param grch38_ac: A valid GRCh38 genomic accession for ``genomic_ac``. If not | ||
| provided, will attempt to retrieve associated GRCh38 accession from UTA. | ||
| :return: Tuple containing GRCh38 accession, GRCh38 position, and error message | ||
| if unable to get GRCh38 representation | ||
| :param genomic_pos: A genomic coordinate in GRCh37 | ||
| :param genomic_ac: The genomic accession in GRCh38 | ||
| :param chromosome: The chromosome that genomic_pos occurs on | ||
| :return The genomic coordinate in GRCh38 | ||
| """ | ||
| # Validate accession exists | ||
| if not grch38_ac: | ||
| grch38_ac = await self.uta_db.get_newest_assembly_ac(genomic_ac) | ||
| if not grch38_ac: | ||
| return None, None, f"Unrecognized genomic accession: {genomic_ac}." | ||
|
|
||
| grch38_ac = grch38_ac[0] | ||
|
|
||
| if grch38_ac != genomic_ac: | ||
| # Ensure genomic_ac is GRCh37 | ||
| if not chromosome: | ||
| chromosome, _ = self.seqrepo_access.translate_identifier( | ||
| genomic_ac, Assembly.GRCH37.value | ||
| genomic_ac, target_namespaces=Assembly.GRCH38.value | ||
| ) | ||
| if not chromosome: | ||
| _logger.warning( | ||
| "SeqRepo could not find associated %s assembly for genomic accession %s.", | ||
| Assembly.GRCH37.value, | ||
| genomic_ac, | ||
| ) | ||
| return ( | ||
| None, | ||
| None, | ||
| f"`genomic_ac` must use {Assembly.GRCH37.value} or {Assembly.GRCH38.value} assembly.", | ||
| ) | ||
| chromosome = chromosome[-1].split(":")[-1] | ||
| liftover_data = self.liftover.get_liftover( | ||
| chromosome, genomic_pos, Assembly.GRCH38 | ||
| ) | ||
| if liftover_data is None: | ||
| return ( | ||
| None, | ||
| None, | ||
| f"Lifting over {genomic_pos} on {genomic_ac} from {Assembly.GRCH37.value} to {Assembly.GRCH38.value} was unsuccessful.", | ||
| ) | ||
| genomic_pos = liftover_data[1] | ||
| genomic_ac = grch38_ac | ||
|
|
||
| return genomic_ac, genomic_pos, None | ||
| liftover_data = self.liftover.get_liftover( | ||
| chromosome, genomic_pos, Assembly.GRCH38 | ||
| ) | ||
| return liftover_data[1] if liftover_data else None | ||
|
|
||
| async def _validate_gene_coordinates( | ||
| self, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.