Skip to content

bug (?): unexpected results with enforced interval size greater than default #162

@eboyden

Description

@eboyden

When I aligned a pair of fastqs using BWA-MEM2, Bowtie2, or SNAP, and an enforced interval size of 0-500, I got similar results. When I aligned with BWA-MEM2 or Bowtie2 and an enforced interval size of 50-500, the results didn't change much; very few on-target alignments had intervals <50 bp. But when I aligned with SNAP and an enforced (using -fs) interval size of 50-500, many fewer reads aligned. Furthermore, here is an example of a read pair that no longer aligned with a minimum interval of 50 bp, despite the fact that it's a 120 bp properly paired interval:

MN01688:12:000H3MG5N:1:23106:10622:10676        99      chr1    215878739       70      119M    =       215878739       120     AAAAATCATAGTCACCTTCTCTTACCTCAAATTAGGTCCATTTGGCTTGGATGGTGGTTGCCAAGAAATCACAACATATGATTCACTTAGTGGAATCACAGACAATGGGCCAACATTCT FFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF LB:Z:lb        MC:Z:120M       MD:Z:119        PG:Z:SNAP       RG:Z:rg   PL:Z:ILLUMINA   NM:i:0  SM:Z:sm        MQ:i:70 UQ:i:0  QS:i:4389       PU:Z:pu   ms:i:4389
MN01688:12:000H3MG5N:1:23106:10622:10676        147     chr1    215878739       70      120M    =       215878739       -120    AAAAATCATAGTCACCTTCTCTTACCTCAAATTAGGTCCATTTGGCTTGGATGGTGGTTGCCAAGAAATCACAACATATGATTCACTTAGTGGAATCACAGACAATGGGCCAACATTCTG        FFFFFFFFFFFFFFFFFFFF=FFFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF        LB:Z:lb        MC:Z:119M       MD:Z:120        PG:Z:SNAP       RG:Z:rg   PL:Z:ILLUMINA   NM:i:0  SM:Z:sm        MQ:i:70 UQ:i:0  QS:i:4388       PU:Z:pu   ms:i:4388

I also recapitulated this on the NA12878 data NIST7035_TAAGGCGA; the alignment rate fell from 97% to 59%. And it only seems to happen with the -fs option.

samtools flagstat reports:
-s 0 500:

39731146 + 0 in total (QC-passed reads + QC-failed reads)
39731146 + 0 primary
0 + 0 secondary
0 + 0 supplementary
3114693 + 0 duplicates
3114693 + 0 primary duplicates
39079698 + 0 mapped (98.36% : N/A)
39079698 + 0 primary mapped (98.36% : N/A)
39731146 + 0 paired in sequencing
19865573 + 0 read1
19865573 + 0 read2
38521349 + 0 properly paired (96.96% : N/A)
38615594 + 0 with itself and mate mapped
464104 + 0 singletons (1.17% : N/A)
19728 + 0 with mate mapped to a different chr
14333 + 0 with mate mapped to a different chr (mapQ>=5)

-s 0 500 -fs (similar results):

39731146 + 0 in total (QC-passed reads + QC-failed reads)
39731146 + 0 primary
0 + 0 secondary
0 + 0 supplementary
2899310 + 0 duplicates
2899310 + 0 primary duplicates
38627431 + 0 mapped (97.22% : N/A)
38627431 + 0 primary mapped (97.22% : N/A)
39731146 + 0 paired in sequencing
19865573 + 0 read1
19865573 + 0 read2
38533349 + 0 properly paired (96.99% : N/A)
38556972 + 0 with itself and mate mapped
70459 + 0 singletons (0.18% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

-s 50 500 (similar results):

39731146 + 0 in total (QC-passed reads + QC-failed reads)
39731146 + 0 primary
0 + 0 secondary
0 + 0 supplementary
3144369 + 0 duplicates
3144369 + 0 primary duplicates
39074551 + 0 mapped (98.35% : N/A)
39074551 + 0 primary mapped (98.35% : N/A)
39731146 + 0 paired in sequencing
19865573 + 0 read1
19865573 + 0 read2
23181753 + 0 properly paired (58.35% : N/A)
38607952 + 0 with itself and mate mapped
466599 + 0 singletons (1.17% : N/A)
333828 + 0 with mate mapped to a different chr
47673 + 0 with mate mapped to a different chr (mapQ>=5)

-s 50 500 -fs (abnormally low alignment rate):

39731146 + 0 in total (QC-passed reads + QC-failed reads)
39731146 + 0 primary
0 + 0 secondary
0 + 0 supplementary
1658040 + 0 duplicates
1658040 + 0 primary duplicates
23583000 + 0 mapped (59.36% : N/A)
23583000 + 0 primary mapped (59.36% : N/A)
39731146 + 0 paired in sequencing
19865573 + 0 read1
19865573 + 0 read2
23344882 + 0 properly paired (58.76% : N/A)
23512538 + 0 with itself and mate mapped
70462 + 0 singletons (0.18% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

For comparison:
Bowtie2 --very-fast-local -I 0 --no-discordant (default):

39731146 + 0 in total (QC-passed reads + QC-failed reads)
39731146 + 0 primary
0 + 0 secondary
0 + 0 supplementary
2509954 + 0 duplicates
2509954 + 0 primary duplicates
39685464 + 0 mapped (99.89% : N/A)
39685464 + 0 primary mapped (99.89% : N/A)
39731146 + 0 paired in sequencing
19865573 + 0 read1
19865573 + 0 read2
39514079 + 0 properly paired (99.45% : N/A)
39653236 + 0 with itself and mate mapped
32228 + 0 singletons (0.08% : N/A)
13340 + 0 with mate mapped to a different chr
8318 + 0 with mate mapped to a different chr (mapQ>=5)

Bowtie2 --very-fast-local -I 50 --no-discordant:

39731146 + 0 in total (QC-passed reads + QC-failed reads)
39731146 + 0 primary
0 + 0 secondary
0 + 0 supplementary
2510277 + 0 duplicates
2510277 + 0 primary duplicates
39685267 + 0 mapped (99.88% : N/A)
39685267 + 0 primary mapped (99.88% : N/A)
39731146 + 0 paired in sequencing
19865573 + 0 read1
19865573 + 0 read2
39382600 + 0 properly paired (99.12% : N/A)
39652842 + 0 with itself and mate mapped
32425 + 0 singletons (0.08% : N/A)
19408 + 0 with mate mapped to a different chr
8593 + 0 with mate mapped to a different chr (mapQ>=5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions