Skip to content

Commit 7539f2c

Browse files
Merge pull request #148 from smithlabcode/roi-docs-update
roi: update to docs
2 parents 96dc147 + 61bf519 commit 7539f2c

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

docs/content/roi.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Synopsis
44
```shell
5-
$ dnmtools roi [OPTIONS] <intervals.bed> <input.meth>
5+
$ dnmtools roi [OPTIONS] <intervals.bed> <input.counts>
66
```
77

88
## Description
@@ -17,15 +17,18 @@ found in the documentation for the `levels` command.
1717

1818
The `roi` command requires two input files. The first is a
1919
sorted [counts output file](../counts),
20-
i.e. `input.meth` in the example above. This file provides data for
20+
i.e. `input.counts` in the example above. This file provides data for
2121
every site, either a cytosine or CpG, that is of interest. The second
2222
input file (`intervals.bed`) specifies the genomic intervals in which
2323
methylation statistics should be summarized. If either file is not
2424
sorted by (chrom,end,start,strand) it can be sorted using the
2525
following command:
2626
```shell
27-
$ LC_ALL=C sort -k 1,1 -k 3,3n -k 2,2n -k 6,6 -o input-sorted.meth input.meth
27+
$ LC_ALL=C sort -k 1,1 -k 3,3n -k 2,2n -k 6,6 -o input-sorted.counts input.counts
2828
```
29+
Note: As of v1.4.0, the sorted order of chromosomes/targets within these
30+
files is not important, but the sites within each chromosome must
31+
still be sorted.
2932

3033
The intervals must be specified as a BED format file, and these can be
3134
sorted using [bedtools
@@ -35,9 +38,19 @@ formats: (1) 6-column BED format, which may have more than 6 columns,
3538
but requires the first 6 columns to match the specification, or (2)
3639
3-column BED format.
3740

41+
*An important note about the input files:* several aspects of the
42+
output for `roi` depend on the number of sites within each region of
43+
interest. If the `.counts` file provided as input does not have all
44+
the sites you might expect, for example if it is missing sites that
45+
have been excluded from some earlier step in your pipeline, then the
46+
results will be affected. We hope to make `roi` more robust to this
47+
issue in the future, for example by accepting some information about
48+
the reference genome to ensure that the numbers of sites are as
49+
expected by the user.
50+
3851
From there, the `roi` command can be run as follows:
3952
```shell
40-
$ dnmtools roi -o output.bed regions.bed input-sorted.meth
53+
$ dnmtools roi -o output.bed regions.bed input-sorted.counts
4154
```
4255

4356
The default output format is a 6-column BED format file, with the

0 commit comments

Comments
 (0)