Add include_version parameter to import_gencode and remove_y_par to add_gencode_transcript_annotations#806
Merged
jkgoodrich merged 5 commits intomainfrom Sep 24, 2025
Conversation
…ion. Also include a remove_y_par parameter to fix cases where transcript_id is found twice on PAR regions (chrX and chrY).
ch-kr
reviewed
Sep 24, 2025
Contributor
ch-kr
left a comment
There was a problem hiding this comment.
a couple documentation suggestions and one question
Co-authored-by: Katherine Chao <kchao@broadinstitute.org>
ch-kr
approved these changes
Sep 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR updates the GENCODE import and transcript annotation functionality to better handle transcript ID versions and PAR region duplicates.
Changes
1. Update
import_gencodeto keep full transcript ID including version if requestedinclude_version=True, the function now properly preserves the fullgene_idandtranscript_idincluding version numbers in separate fields (gene_id_versionandtranscript_id_version)2. Fix
add_gencode_transcript_annotationsto include start and end positionannotationsparameter inadd_gencode_transcript_annotationsto includestart_positionandend_positionby default3. Add
remove_y_parparameter to handle PAR region transcript ID duplicatesremove_y_parparameter (default:True) toadd_gencode_transcript_annotationstranscript_id_versionfield to identify Y_PAR transcripts (those ending with "Y_PAR")Technical Details
remove_y_parfunctionality requires that the input GENCODE table includes thetranscript_id_versionfield, which is available whenimport_gencodeis called withinclude_version=TrueImpact