33## Synopsis
44
55``` shell
6- $ dnmtools format [OPTIONS] -f < mapper> < input.sam >
6+ $ dnmtools format [OPTIONS] -f < mapper> < input.bam > [output.bam]
77```
88
99## Description
@@ -16,12 +16,29 @@ important to quantify methylation, as fragments that overlap must
1616count the overlapping bases only once and must be treated as
1717originating from the same allele. These can be ensured by merging them
1818into a single entry. SAM/BAM files generated by abismal, Bismark and
19- BSMAP can be formatted using the ` format ` command. An example use of
20- this command to format a mapped reads file is:
19+ BSMAP can be formatted using the ` format ` command.
20+
21+ An example use of this command to format a mapped reads file is:
2122``` shell
22- $ dnmtools format -f abismal -o input-formatted.sam input .sam
23+ $ dnmtools format -f abismal input.bam output .sam
2324```
2425Above, the file ` input.sam ` would have been generated by ` abismal ` .
26+ The file ` output.bam ` is the output, and an output file is required
27+ here unless the ` -stdout ` argument is specified (see below). Another
28+ example:
29+ ``` shell
30+ $ dnmtools format -f abismal -t 8 -B input.bam output.bam
31+ ```
32+ This will use 8 threads because of the ` -t 8 ` and will produce output
33+ in BAM format because of the ` -B ` flag (not the filename of the
34+ output).
35+
36+ * Note* As of dnmtools v1.2.5, there is no longer a "buffer size"
37+ argument. This introduced arbitrary behavior. Now ` format ` assumes
38+ reads are sorted by read name, which should ensure mates in paired-end
39+ sequencing are consecutive in the file. No "buffer" is needed, and
40+ data that does not conform is more easily detected, making this tool
41+ more easily detect improperly formatted input.
2542
2643## Options
2744
@@ -32,33 +49,63 @@ This option indicates the format of the input SAM file, corresponding
3249to the mapper that generated it (options: abismal, bsmap, bismark).
3350
3451``` txt
35- -o, -output
52+ -t, -threads
53+ ```
54+ The number of threads to use. These threads are used for I/O, and are
55+ most helpful when the input and output are both BAM, where the threads
56+ can really speed things up.
57+
58+ ``` txt
59+ -B, -bam
3660```
37- The name of the output file. The output will be in SAM format. By
38- default this is standard output.
61+ The output is in BAM format. This is an option to help prevent
62+ accidentally writing BAM format to the terminal or through a pipe that
63+ expects plain text, e.g., SAM.
3964
4065``` txt
41- -s, -suffix
66+ -stdout
67+ ```
68+ Write the output to standard out. This is not done by default even
69+ without an output file given, because of the danger of writing BAM to
70+ the terminal or through a pipe unexpectedly. It is possible to write
71+ BAM redirected or through a pipe, but the ` -stdout ` argument is
72+ required.
73+
74+ ``` txt
75+ -s, -suff
4276```
4377The length of the suffix for read names, which indicates whether the
44- read is from end 1 or end 2 (default: 1).
78+ read is from end 1 or end 2 for paired-end reads. If this is not
79+ specified, but the data is paired end (i.e., the flag ` -single-end ` is
80+ not used; see below), then the length of this suffix is inferred.
81+
82+ ``` txt
83+ -single-end
84+ ```
85+ Using this argument tells ` format ` not to look for mates to merge as a
86+ single fragment. The default assumption is that data is paired-ended
87+ and that mates are consecutive in the input.
4588
4689``` txt
4790-L, -max-frag
4891```
4992The maximum allowed insert size in base-pairs (default:
50- 10000 ). Normally this parameter is set at the mapping step, but
51- ` format ` can also reject reads that are in opposing strands in the
52- same chromosome but map more than "max-frag" bases apart.
93+ unlimited ). Normally this parameter is determined during read mapping,
94+ but ` format ` can also reject reads that are in opposing strands in the
95+ same chromosome but map more than this many bases apart.
5396
5497``` txt
55- -B , -buf-size
98+ -F , -force
5699```
57- Maximum buffer size (default: 10000). This is the maximum
58- number of reads retained before mates are no longer seeked for
59- reads. If more than "buf-size" reads are not a proper mate of a given
60- read, the read is printed as-is and reported as single-end. This value
61- has no effect if the input is single-end.
100+ This option "forces" the ` format ` command to process paired-end reads
101+ even if it is unable to detect mates. Without this argument, failure
102+ to detect mates will cause ` format ` to terminate. This option is
103+ useful, for example, if the reads were paired-ended, but the second
104+ end is of such low quality that only reads from the first end were
105+ mapped. In a data analysis pipeline, it might not be apparent that one
106+ of two ends failed entirely, so providing this option can help. If you
107+ are only analyzing a small number of data sets, you probably want to
108+ be made aware of this problem rather than force it to be ignored.
62109
63110``` txt
64111-v, -verbose
0 commit comments