Skip to content

Commit 61bf519

Browse files
roi: adding some updates to the docs to ensure users know that the numbers of sites in the output are determined by the content of the counts input file, and not be the reference genome
1 parent ca7c708 commit 61bf519

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

docs/content/roi.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Synopsis
44
```shell
5-
$ dnmtools roi [OPTIONS] <intervals.bed> <input.meth>
5+
$ dnmtools roi [OPTIONS] <intervals.bed> <input.counts>
66
```
77

88
## Description
@@ -17,15 +17,18 @@ found in the documentation for the `levels` command.
1717

1818
The `roi` command requires two input files. The first is a
1919
sorted [counts output file](../counts),
20-
i.e. `input.meth` in the example above. This file provides data for
20+
i.e. `input.counts` in the example above. This file provides data for
2121
every site, either a cytosine or CpG, that is of interest. The second
2222
input file (`intervals.bed`) specifies the genomic intervals in which
2323
methylation statistics should be summarized. If either file is not
2424
sorted by (chrom,end,start,strand) it can be sorted using the
2525
following command:
2626
```shell
27-
$ LC_ALL=C sort -k 1,1 -k 3,3n -k 2,2n -k 6,6 -o input-sorted.meth input.meth
27+
$ LC_ALL=C sort -k 1,1 -k 3,3n -k 2,2n -k 6,6 -o input-sorted.counts input.counts
2828
```
29+
Note: As of v1.4.0, the sorted order of chromosomes/targets within these
30+
files is not important, but the sites within each chromosome must
31+
still be sorted.
2932

3033
The intervals must be specified as a BED format file, and these can be
3134
sorted using [bedtools
@@ -35,9 +38,19 @@ formats: (1) 6-column BED format, which may have more than 6 columns,
3538
but requires the first 6 columns to match the specification, or (2)
3639
3-column BED format.
3740

41+
*An important note about the input files:* several aspects of the
42+
output for `roi` depend on the number of sites within each region of
43+
interest. If the `.counts` file provided as input does not have all
44+
the sites you might expect, for example if it is missing sites that
45+
have been excluded from some earlier step in your pipeline, then the
46+
results will be affected. We hope to make `roi` more robust to this
47+
issue in the future, for example by accepting some information about
48+
the reference genome to ensure that the numbers of sites are as
49+
expected by the user.
50+
3851
From there, the `roi` command can be run as follows:
3952
```shell
40-
$ dnmtools roi -o output.bed regions.bed input-sorted.meth
53+
$ dnmtools roi -o output.bed regions.bed input-sorted.counts
4154
```
4255

4356
The default output format is a 6-column BED format file, with the

0 commit comments

Comments
 (0)