Skip to content

Commit 69eaf65

Browse files
Updating documentation for uniq to reflect changes to that command ahead of v1.2.5
1 parent ad9505d commit 69eaf65

File tree

1 file changed

+21
-18
lines changed

1 file changed

+21
-18
lines changed

docs/content/uniq.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -18,18 +18,25 @@ have identical sequences and are mapped to the same genomic location
1818
chooses a random one to be the representative of the original DNA
1919
sequence.
2020

21+
*Note* As of dnmtools v1.2.5, the option to use the sequence of reads
22+
when deciding if two reads are duplicates has been removed. In the
23+
context of analyzing bisulfite sequencing reads, this has the danger
24+
of introducing bias in downstream analyses. Also, in the same version
25+
the test for sorted order of reads cannot be disabled. Empirical tests
26+
showed very little improvement to speed when disabling this test.
27+
2128
The `uniq` command can take reads sorted by (chrom, start, end,
2229
strand). If the reads in the input file are not sorted, run the
2330
following sort command using [samtools](https://samtools.github.io):
2431

2532
```shell
26-
$ samtools sort -O sam -o input-sorted.sam input.sam
33+
$ samtools sort -o reads_sorted.bam reads.bam
2734
```
2835

2936
Next, execute the following command to remove duplicate reads:
3037

3138
```shell
32-
$ dnmtools uniq -S duplicate-removal-stats.txt input-sorted.sam out-sorted.sam
39+
$ dnmtools uniq -S duplicate-removal-stats.txt reads_sorted.bam reads_uniq.bam
3340
```
3441

3542
## Options
@@ -47,30 +54,26 @@ Output a histogram of duplication frequencies into the specified file
4754
for library complexity analysis.
4855

4956
```txt
50-
-s, -seq
51-
```
52-
Use the sequences of the reads to distinguish duplicates. This is not
53-
often recommended.
54-
55-
```txt
56-
-A, -all-cytosines
57+
-B, -bam
5758
```
58-
Use all cytosines when comparing reads based on sequence (default:
59-
only use CpG sites). Only applies if `-s` (above) is used.
59+
The output is in BAM format. This is an option to help prevent
60+
accidentally writing BAM format to the terminal or through a pipe that
61+
expects plain text, e.g., SAM.
6062

6163
```txt
62-
-D, -disable
64+
-stdout
6365
```
64-
Disable testing if the reads are sorted by chromosome and
65-
position. This can be faster and is fine if you know your reads are
66-
sorted.
66+
Write the output to standard out. This is not done by default even
67+
without an output file given, because of the danger of writing BAM to
68+
the terminal or through a pipe unexpectedly. It is possible to write
69+
BAM redirected or through a pipe, but the `-stdout` argument is
70+
required.
6771

6872
```txt
6973
-s, -seed
7074
```
71-
Random number seed. Which read to keep, among duplicates, is chosen
72-
randomly (default: 408). This option is typically only used for
73-
testing.
75+
Random number seed. Affects which read is kept among duplicates. The
76+
default seed is 408. This option is typically only used for testing.
7477

7578
```txt
7679
-v, -verbose

0 commit comments

Comments
 (0)