Skip to content

Commit 8cd6a13

Browse files
j23414joverlee521
andcommitted
newreference: If gene is given but not found in genbank, error out
Currently, if the --gene option is used and the gene name is not found, the script will use the entire genome which may cause some silent undesired behaviors. This commit changes that such that the script will error out if the gene is not found in the GenBank file as this indicates the gene name may be misspelled or the user may be using the wrong GenBank file. If the --gene option is not used, the script will continue to process the entire genome as expected. Co-authored-by: Jover Lee <joverlee521@gmail.com>
1 parent 570daa6 commit 8cd6a13

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

scripts/newreference.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from Bio.SeqFeature import SeqFeature, FeatureLocation, Seq
44
import shutil
55
import argparse
6+
import sys
67

78
def new_reference(referencefile, outgenbank, outfasta, gene):
89
ref = SeqIO.read(referencefile, "genbank")
@@ -17,6 +18,12 @@ def new_reference(referencefile, outgenbank, outfasta, gene):
1718
startofgene = int(list(feature.location)[0])
1819
endofgene = int(list(feature.location)[-1])+1
1920

21+
# If user provides a --gene 'some name' is not found, print a warning and use the entire genome.
22+
# Otherwise do not print a warning.
23+
if(gene is not None and startofgene is None and endofgene is None):
24+
print(f"ERROR: No '{gene}' was found under 'gene' or 'CDS' features in the GenBank file.", file=sys.stderr)
25+
sys.exit(1)
26+
2027
record = ref[startofgene:endofgene]
2128
source_feature = SeqFeature(FeatureLocation(start=0, end=len(record)), type='source',
2229
qualifiers=ref_source_feature.qualifiers)

0 commit comments

Comments
 (0)