BURST (Better Ultrafast Reconciliation of Sequences Tool) Documentation

Overview

BURST is an optimal, high-speed pairwise sequence aligner specialized in aligning many NGS short reads against large reference databases. It is designed to provide mathematically optimal alignments while maintaining exceptional speed, making it particularly useful for metagenomic studies and large-scale sequence analysis projects.

Version

This documentation covers BURST version 1.0+.

Features

Optimal end-to-end alignment of variable-length short reads (up to a few thousand bases) against arbitrary reference sequences
Gapped alignment support
Multiple alignment modes:
- BEST: Report first best match by hybrid BLAST id
- ALLPATHS: Report all ties with the same error profile
- CAPITALIST: Minimize set of references AND interpolate taxonomy (default)
- FORAGE: Report all matches above specified threshold
- ANY: Report any valid hit above specified threshold
Optional optimal LCA taxonomy assignment with customizable confidence cutoff
Full IUPAC ambiguous base support in queries and references
Accelerator mode for faster alignment using k-mer hashing
Fingerprinting for additional filtering of potential matches
Support for reverse complement alignment
Database creation and management tools
Multithreading support for improved performance

Usage

burst {options}

Basic parameters:

--references (-r) <name>: FASTA/edx DB of reference sequences [required]
--accelerator (-a) <name>: Creates/uses a helper DB (acc/acx) [optional]
--queries (-q) <name>: FASTA file of queries to search [required if aligning]
--output (-o) <name>: Blast6/edb file for output alignments/database [required]

Behavior parameters:

--forwardreverse (-fr): Also search the reverse complement of queries
--whitespace (-w): Write full query names in output (include whitespace)
--xalphabet (-x): Allow any alphabet and disable ambiguity matching
--nwildcard (-y): Allow N,X to match anything (in query and reference)
--taxonomy (-b) <name>: Taxonomy map (to interpolate, use -m CAPITALIST)
--mode (-m) <name>: Pick an alignment reporting mode (BEST, ALLPATHS, CAPITALIST, FORAGE, ANY)

Performance parameters:

--dbpartition (-dp) <int>: Split DB making into chunks (lossy)
--taxacut (-bc) <num>: Allow 1/ rank discord OR % conf
--taxa_ncbi (-bn): Assume NCBI header format '>xxx|accsn...' for taxonomy
--skipambig (-sa): Do not consider highly ambiguous queries (5+ ambigs)
--taxasuppress (-bs) [STRICT]: Suppress taxonomic specificity by %ID
--id (-i) <decimal>: Target minimum similarity (range 0-1)
--threads (-t) <int>: How many logical processors to use
--shear (-s) [len]: Shear references longer than [len] bases
--fingerprint (-f): Use sketch fingerprints to precheck matches (or cluster db)
--prepass (-p) [speed]: Use ultra-heuristic pre-matching
--heuristic (-hr): Allow relaxed comparison of low-id matches
--noprogress: Suppress progress indicator
--qbunch (-qb) <int>: Pack QBUNCH with queries divergent
--qbunch_max (-qm) <int>: Max size of QBUNCH
--quickforage (-qf): Output FORAGE'd results inline
--cache (-c) <int>: Performance tweaking parameter
--latency (-l) <int>: Performance tweaking parameter

Alignment Modes

BEST: Reports the first best match based on hybrid BLAST id.
ALLPATHS: Reports all ties with the same error profile.
CAPITALIST: Minimizes the set of references and interpolates taxonomy (default mode).
FORAGE: Reports all matches above the specified threshold.
ANY: Reports any valid hit above the specified threshold.

Database Creation

BURST can create custom databases for faster alignment:

burst -r input_references.fasta -d [DNA|RNA|QUICK] [max_query_length] -o output_database.edx

Options:

DNA/RNA: Creates a full database
QUICK: Creates a faster, but potentially less sensitive database
max_query_length: Optional parameter to specify the maximum expected query length

Accelerator Creation

Note: BURST's accelerator formats are hard-coded for either prefixes of size 12 or 15. The version you're using is displayed in the BURST help string. The smaller size-12 prefix uses less memory but is slower (and is hence suitable for marker gene analysis).

To create an accelerator file for even faster alignments:

burst -r input_references.fasta -d [options] -a output_accelerator.acx -o output_database.edx

Input File Requirements

Reference Sequences

FASTA format
Can be provided as raw FASTA or as a pre-built BURST database (.edx)

Query Sequences

FASTA or FASTQ format
Gzipped input supported
Maximum sequence length: 100MB (configurable)

Taxonomy Mapping (optional)

Tab-delimited text file
Columns: sequence name, taxonomy string
Taxonomy strings are semicolon-delimited

Output Format

BURST outputs alignments in a modified BLAST-6 column format:

Query sequence name
Reference sequence name
Percent identity
Alignment length
Number of mismatches
Number of gap openings
Query start position
Query end position
Subject start position
Subject end position
E-value (set to -1 in BURST)
Bit score (used for other purposes in BURST)
Taxonomy (LCA-based, if provided and using CAPITALIST mode)

Performance Considerations

Use the accelerator (-a) option for faster alignments on large databases
Increase the number of threads (-t) to utilize multiple CPU cores
Adjust the cache (-c) and latency (-l) parameters for fine-tuning performance
Use the fingerprint (-f) option for additional filtering of potential matches
Consider using the prepass (-p) option for ultra-fast, heuristic pre-matching

Limitations

Local alignment is not supported (only end-to-end alignment)
Custom scoring matrices are not implemented
Paired-end unstitched alignments are not directly supported

Error Handling

BURST includes basic error checking for input file formats and command-line arguments. It will print error messages and exit if it encounters issues like malformed input files or invalid options.

Examples

Basic alignment:

burst -r references.fasta -q queries.fasta -o alignments.b6 -i 0.97

Create a database and accelerator:

burst -r references.fasta -d DNA 320 -a references.acx -o references.edx

Align using a pre-built database with taxonomy:

burst -r references.edx -a references.acx -q queries.fasta -b taxonomy.txt -o alignments.b6 -m CAPITALIST

Conclusion

BURST provides a powerful and flexible tool for optimal sequence alignment, particularly suited for metagenomic studies and large-scale sequence analysis projects. Its various output options and optimization features make it suitable for a wide range of applications, from simple best-hit reporting to complex taxonomic assignment tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BURST (Better Ultrafast Reconciliation of Sequences Tool) Documentation

Overview

Version

Features

Usage

Basic parameters:

Behavior parameters:

Performance parameters:

Alignment Modes

Database Creation

Accelerator Creation

Input File Requirements

Reference Sequences

Query Sequences

Taxonomy Mapping (optional)

Output Format

Performance Considerations

Limitations

Error Handling

Examples

Conclusion

FilesExpand file tree

burst-documentation.md

Latest commit

History

burst-documentation.md

File metadata and controls

BURST (Better Ultrafast Reconciliation of Sequences Tool) Documentation

Overview

Version

Features

Usage

Basic parameters:

Behavior parameters:

Performance parameters:

Alignment Modes

Database Creation

Accelerator Creation

Input File Requirements

Reference Sequences

Query Sequences

Taxonomy Mapping (optional)

Output Format

Performance Considerations

Limitations

Error Handling

Examples

Conclusion