Skip to content

Enforce start codon + fix bug explicitly setting training data on command line#72

Open
LanderDC wants to merge 13 commits intoalthonos:mainfrom
LanderDC:main
Open

Enforce start codon + fix bug explicitly setting training data on command line#72
LanderDC wants to merge 13 commits intoalthonos:mainfrom
LanderDC:main

Conversation

@LanderDC
Copy link

Hi,

This PR handles #64 + I found a bug when specifying a custom training file for ORF prediction on a new sequence: even if a training file is specified, pyrodigal would retrain anyway and overwrite the training info.

if args.p == "single":
# use the same interleaving logic as Prodigal
sequences = list(parse(input_file))
training_info = gene_finder.train(
*(seq.seq for seq in sequences),
force_nonsd=args.n,
translation_table=args.g
)

Fixed with:

if args.p == "single" and training_info is None:
...

What has changed:

  • The -c argument now takes one of the options "none" (default), "start" or "both". For backwards compatibility, specifying only -c is equal to -c both which results in both a start and stop codon must be present.
  • The GeneFinder class closed argument should now be a list of two boolean values (again for backwards compatibility, if only one is given the new closed_start and closed_stop arguments will take that value).
  • Updated docs
  • Added test for the enforced start codon + changed test_extract_edge_start() because the extract function does not take the closed argument anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant