Skip to content

Commit 0a3dddb

Browse files
committed
Enhance filter_gff function to include parent IDs for improved filtering of gene models
1 parent aa7f72e commit 0a3dddb

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

bin/Filter.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,9 +156,12 @@ def filter_genes(tpm_cutoff, cov_cutoff, blast_pident, blast_qcovs, rex_pident,
156156

157157
def filter_gff(gff_data, keep):
158158
gff_data["transcript_id"] = gff_data[8].apply(getGeneId)
159+
gff_data["parent_id"] = gff_data[8].apply(getParent)
159160
names = keep['New_ID']
161+
# Add gene IDs (without .t1 suffix) to match gene features
160162
names = pd.concat([names, names.str.split(".").str.get(0)])
161-
to_keep = gff_data["transcript_id"].isin(names)
163+
# Keep rows where either ID or Parent matches the keep list
164+
to_keep = gff_data["transcript_id"].isin(names) | gff_data["parent_id"].isin(names)
162165
gff_keep = gff_data[to_keep]
163166
gff_discard = gff_data[~to_keep]
164167
return gff_keep, gff_discard

0 commit comments

Comments
 (0)