-
Notifications
You must be signed in to change notification settings - Fork 104
[sambamba view] Filter expression syntax
Sambamba-view supports custom filtering for alignment records. This wiki page describes syntax of filter expressions which are provided by the user with --filter command-line option. Fields and flags are described in the SAM specification.
A filter expression is a number of basic conditions linked by and, or, not logical operators, and enclosed in parentheses where needed.
Basic condition is a one for a single record field, tag, or flag.
You can use ==, !=, >, <, >=, <= comparison operators for both integers and strings.
Strings are delimited by single quotes, if you need a single quote inside a string, escape it with \.
Reduce the BAM file to a BAM file containing reads on the second reference sequence chr2 as described in the SAM header.
sambamba view -F "ref_id==1" -f bam HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522.bam > HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522_chr2.bamShow all read names that start with ERR
sambamba view -F "read_name =~ /^ERR/" HG01375.mapped.ILLUMINA.bwa.CLM.low_coverage.20120522_chr1.bam mapping_quality >= 30 and ([RG] =~ /^abcd/ or [NM] == 7)
read_name == 'abc\'def'The following flag names are recognized:
- paired
- proper_pair
- unmapped
- mate_is_unmapped
- reverse_strand
- mate_is_reverse_strand
- first_of_pair
- second_of_pair
- secondary_alignment
- failed_quality_control
- duplicate
- supplementary
- chimeric
not (unmapped or mate_is_unmapped) and first_of_pairConditions for integer and string fields are supported.
List of integer fields:
- ref_id
- position
- mapping_quality
- sequence_length
- mate_ref_id
- mate_position
- template_length
List of string fields:
- read_name
- sequence
- cigar
- strand ('+'/'-')
- ref_name
- mate_ref_name
ref_id == 3 and mapping_quality >= 50 and sequence_length >= 80Tags are denoted by their names in square brackets, for instance, [RG] or [Q2]. They support conditions for both integers and strings, i.e. the tag must also hold value of the corresponding type.
In order to do filtering based on the presence of a particular tag, you can use special null value.
[RG] != null and [AM] == 37