|
| 1 | +# counts-nano - compute single-site methylation from nanopore data |
| 2 | + |
| 3 | +## Synopsis |
| 4 | +```console |
| 5 | +$ dnmtools counts-nano [OPTIONS] -c <chroms> <input.bam> |
| 6 | +``` |
| 7 | + |
| 8 | +## Description |
| 9 | + |
| 10 | +The `counts-nano` command introduced in v1.5.0 is designed specifically to |
| 11 | +generate DNMTools [counts](../counts) format files from nanopore data called |
| 12 | +for the `5mCG_5hmCG` modification. Currently this is only supported for |
| 13 | +methylation and hydroxymethylation called at CpG sites. |
| 14 | + |
| 15 | +More documentation will come as this tool evolves, but for now: |
| 16 | + |
| 17 | +- Most behavior is very similar to what you will find from [counts](../counts). |
| 18 | +- Mutation information is not estimated by `nano-counts`. |
| 19 | +- Currently this only works for CpG sites and when the only modified sites are |
| 20 | + marked as `C+m?` or `C+h?` in the `MM` field of each BAM/SAM read record. |
| 21 | +- The first 6 columns of the output are the same as explained in the |
| 22 | + [counts](../counts) format, except the fraction for the 5th column is both |
| 23 | + 5mC and 5hmC. The 7th column is for 5hmC alone and the 8th is for 5mC alone. |
| 24 | +- The methylation levels will not result in integer values when multiplied by |
| 25 | + the number of reads because probabilities on modifications are used, so |
| 26 | + methylation levels for each site are expected values (the best estimates we |
| 27 | + can make), and do not use arbitrary cutoffs. |
| 28 | +- Other commands in DNMTools have been modified to use this form of expected |
| 29 | + methylation level, and behave as previously for bisulfite sequencing data, |
| 30 | + but have updated behavior when the data is from nanopore. The user does not |
| 31 | + need to specify the technology used. |
| 32 | +- Some commands need to use a `-relaxed` flag to work with the additional |
| 33 | + columns in the output from `counts-nano` compared with `counts`. For |
| 34 | + commands without this option, simply do `cut -f1-6` on the output of |
| 35 | + `counts-nano` to remove those. |
| 36 | + |
| 37 | +## Options |
| 38 | + |
| 39 | +```txt |
| 40 | +-o, -output |
| 41 | +``` |
| 42 | +Output file name. The default is to write output to the terminal, |
| 43 | +which is not useful unless commands are piped. |
| 44 | + |
| 45 | +```txt |
| 46 | +-c, -chrom |
| 47 | +``` |
| 48 | +Reference genome file, which must be in FASTA format. This is |
| 49 | +required. |
| 50 | + |
| 51 | +```txt |
| 52 | +-t, -threads |
| 53 | +``` |
| 54 | + |
| 55 | +The number of threads to use. This is only really helpful if the input is BAM |
| 56 | +(not helpful for SAM), and the output is to be zipped (see `-z` below). These |
| 57 | +threads are used to decompress BAM input and compress gzip output. If only one |
| 58 | +of these conditions holds, using more threads can still help. Because most |
| 59 | +computation in `counts-nano` is processing reads sequentially, using too many |
| 60 | +threads will have decreasing returns. |
| 61 | + |
| 62 | +```txt |
| 63 | +-z, -zip |
| 64 | +``` |
| 65 | + |
| 66 | +The output should be zipped (in gzip format). This is not deduced by the |
| 67 | +filename, but specifying this argument should be accompanied by using a `.gz` |
| 68 | +filename suffix for the output. |
| 69 | + |
| 70 | +```txt |
| 71 | +-n, -cpg-only |
| 72 | +``` |
| 73 | + |
| 74 | +Print only CpG context cytosines. This significantly reduces the output size |
| 75 | +in most genomes. Note that using this option does not merge data as symmetric |
| 76 | +CpGs. |
| 77 | + |
| 78 | +```txt |
| 79 | +-sym |
| 80 | +``` |
| 81 | + |
| 82 | +This will turn on `-n, -cpg-only` automatically and will output symmetric CpG |
| 83 | +sites, with each level including all counts and methylation levels as a |
| 84 | +(weighted) average of both strands. |
| 85 | + |
| 86 | +```txt |
| 87 | +-H, -header |
| 88 | +``` |
| 89 | + |
| 90 | +Add a header to the output file to identify the reference genome. This will be |
| 91 | +in the form of "comment" lines beginning with `#`. This is not required for most |
| 92 | +downstream processing, but is used by commands that check for consistency with |
| 93 | +a reference genome. |
| 94 | + |
| 95 | +```txt |
| 96 | +-v, -verbose |
| 97 | +``` |
| 98 | + |
| 99 | +Report more information while the program is running. |
| 100 | + |
| 101 | +```txt |
| 102 | +-progress |
| 103 | +``` |
| 104 | +Show progress while the program is running. |
0 commit comments