Releases: dportik/SuperCRUNCH
Releases · dportik/SuperCRUNCH
v1.3.2
v1.3.1
v1.3.0
Changes in v1.3.0:
- Added a
condaenvironment recipe for SuperCRUNCH, allowing easy installation of all requirements except MACSE. Parse_Loci.py: Added new feature that allows a term to be added to the loci search terms that will exclude a record if a match is found. For example, adding the negative termpseudogenewill exclude all records containing that word, even if they match the other abbreviation or description terms. This requires a four-column search terms file, where the fourth column is the negative term (N/Ain this column indicates no negative term should be used). This module was made backwards-compatible with the three-column search terms file - if a fourth column is not present theN/Ais automatically generated.Filter_Seqs_and_Species.py: Added--accessions_includeflag. This points to a text file of accession numbers (one per line). When used with the--seq_selection oneseqoption, if an accession included in the list is found in the available seqs for a taxon and gene, it must be selected. This is not just an "allowed list", this list will override other settings for selection such as length. Also added the--accessions_excludeflag, which points to a text file of accession numbers (one per line). These accessions will NEVER be selected - they are removed from all searches. This is the equivalent of including a "blocked list".Taxa_Assessment.py: Altered SQL search query for "unmatched" taxa to avoid sql variable limit maximum issue. Also, now invokes theSeqIO.index_db()method for sequence files >5GB, rather than usingSeqIO.index()method, which is much more memory efficient for big data. TheSeqIO.index_db()method is already used inParse_Loci.py.Cluster_Blast_Extract.py: Added feature to remove problematic long sequences if they somehow end up in the main cluster of sequences for a gene. The new filter removes all seqs that are 1.3x the length of the 95th percentile of all lengths.- Added a new
Remove_Long_Accessions.pymodule, which can filter a downloaded GenBank fasta file to remove extremely long sequences (>150kb). This will eliminate whole genome sequencing records, which are not useful for SuperCRUNCH. - Updated recognition for file extensions produced by updated blastn tools (
.ndb,.not,.ntf,.nto).
Release for Zenodo archiving
V1.2.1 added interleaved nexus output
Release 1.2
- Version 1.2:
- Made all modules compatible with Python 2.7 and Python 3.7.
- SQL now implemented in
Parse_Loci.py(up to 30x speedup!),Filter_Seqs_and_Species.py(3x speedup), andTaxon_Assessment.py(3x speedup). - Added output directory specification to all modules.
- Two trimming modules now included:
Trim_Alignments_Trimal.pyandTrim_Alignments_Custom.py. TheTrim_Alignments_Custom.pymodule allows finding start and stop block positions, and row-wise (internal) sliding window trimming based on divergence. - Added new module
Filter_Fasta_by_Min_Seqs.pyto filter fasta files using a minimum number of sequences. - Output directory structures improved for all modules.
- Added
--quietoption toFilter_Seqs_and_Species.pyfor less output on screen (useful when processing large numbers of loci). - Added option
--numericaltoFasta_Get_Taxa.pyto allow non-alphabetical identifiers for subspecies/trinomial name combinations. This allows museum, field, or numerical codes to be discovered. - Re-ordered tasks in
Cluster_Blast_Extract.pyto allow completion of all steps for one fasta file before moving to next fasta file in sequence. - Added multithreading for BLAST searches and new --bp_bridge flag for coordinate merging in
Cluster_Blast_Extract.pyandReference_Blast_Extract.py. - Remove empty fasta files sometimes produced by
Coding_Translation_Tests.py. - Complete code re-write for
Align.py,Cluster_Blast_Extract.py,Filter_Seqs_and_Species.py,Parse_Loci.py,Taxon_Assessment.py. - Module
Relabel_Fasta.pyis nowFasta_Relabel_Seqs.py.
Release 1.1
- Version 1.1:
- Added multithreading option for MAFFT and Clustal-O in
Align.py - Added multithreading option for MAFFT in
Adjust_Direction.py - Added arg to specify output directory for
Concatenation.py - Corrected output column labeling in label key output files from
Relabel_Fasta.py - Added gappyout option for trimming with trimAl in
Trim_Alignments.py - Output sequences failing similarity searches to own file in
Cluster_Blast_Extract.pyandReference_Blast_Extract.py - Updated documentation on wiki pages
- Added multithreading option for MAFFT and Clustal-O in
initial release
Initial release of SuperCRUNCH.