Skip to content

Commit 7d1738b

Browse files
docs/content/counts-nano.md: adding first docs for counts-nano
1 parent 1ae2816 commit 7d1738b

File tree

1 file changed

+104
-0
lines changed

1 file changed

+104
-0
lines changed

docs/content/counts-nano.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# counts-nano - compute single-site methylation from nanopore data
2+
3+
## Synopsis
4+
```console
5+
$ dnmtools counts-nano [OPTIONS] -c <chroms> <input.bam>
6+
```
7+
8+
## Description
9+
10+
The `counts-nano` command introduced in v1.5.0 is designed specifically to
11+
generate DNMTools [counts](../counts) format files from nanopore data called
12+
for the `5mCG_5hmCG` modification. Currently this is only supported for
13+
methylation and hydroxymethylation called at CpG sites.
14+
15+
More documentation will come as this tool evolves, but for now:
16+
17+
- Most behavior is very similar to what you will find from [counts](../counts).
18+
- Mutation information is not estimated by `nano-counts`.
19+
- Currently this only works for CpG sites and when the only modified sites are
20+
marked as `C+m?` or `C+h?` in the `MM` field of each BAM/SAM read record.
21+
- The first 6 columns of the output are the same as explained in the
22+
[counts](../counts) format, except the fraction for the 5th column is both
23+
5mC and 5hmC. The 7th column is for 5hmC alone and the 8th is for 5mC alone.
24+
- The methylation levels will not result in integer values when multiplied by
25+
the number of reads because probabilities on modifications are used, so
26+
methylation levels for each site are expected values (the best estimates we
27+
can make), and do not use arbitrary cutoffs.
28+
- Other commands in DNMTools have been modified to use this form of expected
29+
methylation level, and behave as previously for bisulfite sequencing data,
30+
but have updated behavior when the data is from nanopore. The user does not
31+
need to specify the technology used.
32+
- Some commands need to use a `-relaxed` flag to work with the additional
33+
columns in the output from `counts-nano` compared with `counts`. For
34+
commands without this option, simply do `cut -f1-6` on the output of
35+
`counts-nano` to remove those.
36+
37+
## Options
38+
39+
```txt
40+
-o, -output
41+
```
42+
Output file name. The default is to write output to the terminal,
43+
which is not useful unless commands are piped.
44+
45+
```txt
46+
-c, -chrom
47+
```
48+
Reference genome file, which must be in FASTA format. This is
49+
required.
50+
51+
```txt
52+
-t, -threads
53+
```
54+
55+
The number of threads to use. This is only really helpful if the input is BAM
56+
(not helpful for SAM), and the output is to be zipped (see `-z` below). These
57+
threads are used to decompress BAM input and compress gzip output. If only one
58+
of these conditions holds, using more threads can still help. Because most
59+
computation in `counts-nano` is processing reads sequentially, using too many
60+
threads will have decreasing returns.
61+
62+
```txt
63+
-z, -zip
64+
```
65+
66+
The output should be zipped (in gzip format). This is not deduced by the
67+
filename, but specifying this argument should be accompanied by using a `.gz`
68+
filename suffix for the output.
69+
70+
```txt
71+
-n, -cpg-only
72+
```
73+
74+
Print only CpG context cytosines. This significantly reduces the output size
75+
in most genomes. Note that using this option does not merge data as symmetric
76+
CpGs.
77+
78+
```txt
79+
-sym
80+
```
81+
82+
This will turn on `-n, -cpg-only` automatically and will output symmetric CpG
83+
sites, with each level including all counts and methylation levels as a
84+
(weighted) average of both strands.
85+
86+
```txt
87+
-H, -header
88+
```
89+
90+
Add a header to the output file to identify the reference genome. This will be
91+
in the form of "comment" lines beginning with `#`. This is not required for most
92+
downstream processing, but is used by commands that check for consistency with
93+
a reference genome.
94+
95+
```txt
96+
-v, -verbose
97+
```
98+
99+
Report more information while the program is running.
100+
101+
```txt
102+
-progress
103+
```
104+
Show progress while the program is running.

0 commit comments

Comments
 (0)