@@ -18,18 +18,25 @@ have identical sequences and are mapped to the same genomic location
1818chooses a random one to be the representative of the original DNA
1919sequence.
2020
21+ * Note* As of dnmtools v1.2.5, the option to use the sequence of reads
22+ when deciding if two reads are duplicates has been removed. In the
23+ context of analyzing bisulfite sequencing reads, this has the danger
24+ of introducing bias in downstream analyses. Also, in the same version
25+ the test for sorted order of reads cannot be disabled. Empirical tests
26+ showed very little improvement to speed when disabling this test.
27+
2128The ` uniq ` command can take reads sorted by (chrom, start, end,
2229strand). If the reads in the input file are not sorted, run the
2330following sort command using [ samtools] ( https://samtools.github.io ) :
2431
2532``` shell
26- $ samtools sort -O sam -o input-sorted.sam input.sam
33+ $ samtools sort -o reads_sorted.bam reads.bam
2734```
2835
2936Next, execute the following command to remove duplicate reads:
3037
3138``` shell
32- $ dnmtools uniq -S duplicate-removal-stats.txt input-sorted.sam out-sorted.sam
39+ $ dnmtools uniq -S duplicate-removal-stats.txt reads_sorted.bam reads_uniq.bam
3340```
3441
3542## Options
@@ -47,30 +54,26 @@ Output a histogram of duplication frequencies into the specified file
4754for library complexity analysis.
4855
4956``` txt
50- -s, -seq
51- ```
52- Use the sequences of the reads to distinguish duplicates. This is not
53- often recommended.
54-
55- ``` txt
56- -A, -all-cytosines
57+ -B, -bam
5758```
58- Use all cytosines when comparing reads based on sequence (default:
59- only use CpG sites). Only applies if ` -s ` (above) is used.
59+ The output is in BAM format. This is an option to help prevent
60+ accidentally writing BAM format to the terminal or through a pipe that
61+ expects plain text, e.g., SAM.
6062
6163``` txt
62- -D, -disable
64+ -stdout
6365```
64- Disable testing if the reads are sorted by chromosome and
65- position. This can be faster and is fine if you know your reads are
66- sorted.
66+ Write the output to standard out. This is not done by default even
67+ without an output file given, because of the danger of writing BAM to
68+ the terminal or through a pipe unexpectedly. It is possible to write
69+ BAM redirected or through a pipe, but the ` -stdout ` argument is
70+ required.
6771
6872``` txt
6973 -s, -seed
7074```
71- Random number seed. Which read to keep, among duplicates, is chosen
72- randomly (default: 408). This option is typically only used for
73- testing.
75+ Random number seed. Affects which read is kept among duplicates. The
76+ default seed is 408. This option is typically only used for testing.
7477
7578``` txt
7679 -v, -verbose
0 commit comments