Skip to content

ValueError: sequence must be at least 20000 characters (17046 found) #229

@abrozzi

Description

@abrozzi

Dears

I run:

chewBBACA.py AlleleCall -i ./data/raw/fasta -g ./gene_schema -o ./data/processed/gene --cpu 14

on 6285 assembly genomes of Klebsiella.

And I got this:


 CDS prediction
================
Predicting CDSs for 6285 inputs...
 [=====               ] 28% 27%
Error on predict_genome_genes:
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/chewie/lib/python3.11/site-packages/CHEWBBACA/utils/multiprocessing_operations.py", line 42, in function_helper
    results = input_args[-1](*input_args[0:-1])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/chewie/lib/python3.11/site-packages/CHEWBBACA/utils/gene_prediction.py", line 217, in predict_genome_genes
    current_gene_finder = train_gene_finder(current_gene_finder,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/chewie/lib/python3.11/site-packages/CHEWBBACA/utils/gene_prediction.py", line 75, in train_gene_finder
    gene_finder.train(*sequences, translation_table=translation_table)
  File "pyrodigal/lib.pyx", line 5528, in pyrodigal.lib.GeneFinder.train
ValueError: sequence must be at least 20000 characters (17046 found)

Shall I pre-process the fasta files and filter out contigs <= 20000 characters ?

Bests,
Alex

Metadata

Metadata

Assignees

Labels

Status: In ProgressHas been assigned and is being worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions