Skip to content

Commit 8fa79e3

Browse files
committed
Merge branch 'ar/changelog-051rc1' into 'master'
Prepare docs 0.5.1-rc1 See merge request machine-learning/modkit!306
2 parents 2f06f02 + d013434 commit 8fa79e3

File tree

8 files changed

+243
-97
lines changed

8 files changed

+243
-97
lines changed

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [v0.5.1-rc1]
8+
### Fixes
9+
- [pileup] Fixes bug where N_diff was incorrectly summed when `--combine-strands` was used. (N.B. %-methylation was still correct).
10+
- Fixes bug where fallback threshold was incorrect. Will now be highest observed explicit canonical probability, was highest bin.
11+
- Fixes conversion to bigWig when chromosomes aren't sorted exactly the way that `sort` would do it.
12+
### Adds
13+
- [cli] Adds shell completions command, thanks to @killercup
14+
- [dmr] Adds 2OMe mod codes
15+
- [validate] Adds better logging when NM tag is missing or reads fail.
16+
- Allows bgzip-compressed FASTA references across all commands
17+
- [bedmethyl, tobigwig] Allow using header from modBAM instead of sizes.
18+
719
## [v0.5.0]
820
### Adds
921
- [open-chromatin] Adds open chromatin prediction subcommand for 6mA MTase-treated DNA

book/src/advanced_usage.md

Lines changed: 53 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2698,52 +2698,82 @@ Logging Options:
26982698
Make a BigWig track from a bedMethyl file or stream. For details on the BigWig
26992699
format see https://doi.org/10.1093/bioinformatics/btq351
27002700
2701-
Usage: modkit bedmethyl tobigwig [OPTIONS] --sizes <CHROMSIZES> --mod-codes <MOD_CODES> <IN_BEDMETHYL> <OUT_FP>
2701+
Usage: modkit bedmethyl tobigwig [OPTIONS] --mod-codes <MOD_CODES> <--sizes <CHROMSIZES>|--header <INPUT_BAM>> <IN_BEDMETHYL> <OUT_FP>
27022702
27032703
Arguments:
2704-
<IN_BEDMETHYL> Input bedmethyl, uncompressed, "-" or "stdin" indicates an
2705-
input stream
2706-
<OUT_FP> Output bigWig filename
2704+
<IN_BEDMETHYL>
2705+
Input bedmethyl, uncompressed, "-" or "stdin" indicates an input
2706+
stream
2707+
2708+
<OUT_FP>
2709+
Output bigWig filename
27072710
27082711
Options:
2709-
-g, --sizes <CHROMSIZES> A chromosome sizes file. Each line should be have
2710-
a chromosome and its size in bases, separated by
2711-
whitespace. A fasta index (.fai) works as well
2712-
-m, --mod-codes <MOD_CODES> Make a bigWig track where the values are the
2713-
percent of bases with this modification, use
2714-
multiple comma-separated codes to combine counts.
2715-
For example --mod-code m makes a track of the 5mC
2716-
percentages and --mod-codes h,m will make a track
2717-
of the combined counts from 5hmC and 5mC.
2718-
Combining counts for different primary bases will
2719-
cause an error (e.g. --mod-codes a,h)
2720-
-h, --help Print help
2712+
-g, --sizes <CHROMSIZES>
2713+
A chromosome sizes file. Each line should be a chromosome and its size
2714+
in bases, separated by whitespace. A fasta index (.fai) works as well.
2715+
Use instead of the bam header
2716+
2717+
-b, --header <INPUT_BAM>
2718+
modBAM from which the pileup was generated. Chromosome sizes are
2719+
gathered from the header. Use instead of the chromosome sizes file
2720+
2721+
-m, --mod-codes <MOD_CODES>
2722+
Make a bigWig track where the values are the percent of bases with
2723+
this modification, use multiple comma-separated codes to combine
2724+
counts. For example --mod-code m makes a track of the 5mC percentages
2725+
and --mod-codes h,m will make a track of the combined counts from 5hmC
2726+
and 5mC. Combining counts for different primary bases will cause an
2727+
error (e.g. --mod-codes a,h will error)
2728+
2729+
-h, --help
2730+
Print help (see a summary with '-h')
27212731
27222732
Output Options:
27232733
--negative-strand-values
27242734
Report the percentages on the negative strand as negative values. The
27252735
data range will be [-100, 100]
2736+
27262737
-z, --nzooms <NZOOMS>
2727-
Set the maximum of zooms to create [default: 10]
2738+
Set the maximum of zooms to create
2739+
2740+
[default: 10]
2741+
27282742
--zooms <ZOOMS>...
27292743
Set the zoom resolutions to use (overrides the --nzooms argument)
2744+
27302745
-u, --uncompressed
27312746
Don't use compression
2747+
27322748
--block-size <BLOCK_SIZE>
2733-
Number of items to bundle in r-tree [default: 256]
2749+
Number of items to bundle in r-tree
2750+
2751+
[default: 256]
2752+
27342753
--items-per-slot <ITEMS_PER_SLOT>
2735-
Number of data points bundled at lowest level [default: 1024]
2754+
Number of data points bundled at lowest level
2755+
2756+
[default: 1024]
27362757
27372758
Compute Options:
2738-
-t, --nthreads <NTHREADS> Set the number of threads to use. This tool will
2739-
typically use ~225% CPU on a HDD. SDDs may be
2740-
higher. (IO bound) [default: 6]
2741-
--inmemory Do not create temporary files for intermediate data
2759+
-t, --nthreads <NTHREADS>
2760+
Set the number of threads to use. This tool will typically use ~225%
2761+
CPU on a HDD. SDDs may be higher. (IO bound)
2762+
2763+
[default: 6]
2764+
2765+
--inmemory
2766+
Do not create temporary files for intermediate data
2767+
2768+
--force-chromosome-ordering
2769+
If input bedMethyl has sorting of the same scheme as `sort`, this
2770+
option may speed up conversion
27422771
27432772
Logging Options:
27442773
--log-filepath <LOG_FILEPATH>
27452774
Specify a file for debug logs to be written to, otherwise ignore them.
27462775
Setting a file is recommended. (alias: log)
2776+
27472777
--suppress-progress
27482778
Hide the progress bar
27492779
```

docs/advanced_usage.html

Lines changed: 63 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -392,7 +392,7 @@ <h2 id="pileup"><a class="header" href="#pileup">pileup</a></h2>
392392
--filter-threshold 0.9 will specify a threshold value of 0.70 for
393393
adenine and 0.9 for all other base modification calls
394394

395-
--mod-thresholds &lt;MOD_THRESHOLDS&gt;
395+
--mod-threshold &lt;MOD_THRESHOLDS&gt;
396396
Specify a passing threshold to use for a base modification,
397397
independent of the threshold for the primary sequence base or the
398398
default. For example, to set the pass threshold for 5hmC to 0.8 use
@@ -585,7 +585,7 @@ <h2 id="adjust-mods"><a class="header" href="#adjust-mods">adjust-mods</a></h2>
585585
--filter-probs
586586
Filter out the lowest confidence base modification probabilities
587587

588-
--only-mapped
588+
--mapped-only
589589
Only use base modification probabilities from bases that are aligned
590590
when estimating the filter threshold (i.e. ignore soft-clipped, and
591591
inserted bases)
@@ -787,7 +787,7 @@ <h2 id="sample-probs"><a class="header" href="#sample-probs">sample-probs</a></h
787787
Only sample base modification probabilities that are aligned to the
788788
positions in this BED file. (alias: include-positions)
789789

790-
--only-mapped
790+
--mapped-only
791791
Only use base modification probabilities that are aligned (i.e. ignore
792792
soft-clipped, and inserted bases)
793793

@@ -903,7 +903,7 @@ <h2 id="summary"><a class="header" href="#summary">summary</a></h2>
903903
--filter-threshold 0.9 will specify a threshold value of 0.70 for
904904
adenine and 0.9 for all other base modification calls
905905

906-
--mod-thresholds &lt;MOD_THRESHOLDS&gt;
906+
--mod-threshold &lt;MOD_THRESHOLDS&gt;
907907
Specify a passing threshold to use for a base modification,
908908
independent of the threshold for the primary sequence base or the
909909
default. For example, to set the pass threshold for 5hmC to 0.8 use
@@ -940,7 +940,7 @@ <h2 id="summary"><a class="header" href="#summary">summary</a></h2>
940940
Only summarize base modification probabilities that are aligned to the
941941
positions in this BED file. (alias: include-positions)
942942

943-
--only-mapped
943+
--mapped-only
944944
Only use base modification probabilities that are aligned (i.e. ignore
945945
soft-clipped, and inserted bases)
946946

@@ -1375,7 +1375,7 @@ <h2 id="pileup-hemi"><a class="header" href="#pileup-hemi">pileup-hemi</a></h2>
13751375
--filter-threshold 0.9 will specify a threshold value of 0.70 for
13761376
adenine and 0.9 for all other base modification calls
13771377

1378-
--mod-thresholds &lt;MOD_THRESHOLDS&gt;
1378+
--mod-threshold &lt;MOD_THRESHOLDS&gt;
13791379
Specify a passing threshold to use for a base modification,
13801380
independent of the threshold for the primary sequence base or the
13811381
default. For example, to set the pass threshold for 5hmC to 0.8 use
@@ -1513,10 +1513,10 @@ <h2 id="entropy"><a class="header" href="#entropy">entropy</a></h2>
15131513
--filter-threshold &lt;FILTER_THRESHOLD&gt;
15141514
Specify the filter threshold globally or for the canonical calls. When
15151515
specified, base modification call probabilities will be required to be
1516-
greater than or equal to this number. If `--mod-thresholds` is also
1516+
greater than or equal to this number. If `--mod-threshold` is also
15171517
specified, _this_ value will be used for canonical calls
15181518

1519-
--mod-thresholds &lt;MOD_THRESHOLDS&gt;
1519+
--mod-threshold &lt;MOD_THRESHOLDS&gt;
15201520
Specify a passing threshold to use for a base modification,
15211521
independent of the threshold for the primary sequence base or the
15221522
default. For example, to set the pass threshold for 5hmC to 0.8 use
@@ -2075,7 +2075,7 @@ <h2 id="extract-calls"><a class="header" href="#extract-calls">extract calls</a>
20752075
--filter-threshold 0.9 will specify a threshold value of 0.70 for
20762076
adenine and 0.9 for all other base modification calls
20772077

2078-
--mod-thresholds &lt;MOD_THRESHOLDS&gt;
2078+
--mod-threshold &lt;MOD_THRESHOLDS&gt;
20792079
Specify a passing threshold to use for a base modification,
20802080
independent of the threshold for the primary sequence base or the
20812081
default. For example, to set the pass threshold for 5hmC to 0.8 use
@@ -2827,52 +2827,82 @@ <h2 id="bedmethyl-tobigwig"><a class="header" href="#bedmethyl-tobigwig">bedmeth
28272827
<pre><code class="language-text">Make a BigWig track from a bedMethyl file or stream. For details on the BigWig
28282828
format see https://doi.org/10.1093/bioinformatics/btq351
28292829

2830-
Usage: modkit bedmethyl tobigwig [OPTIONS] --sizes &lt;CHROMSIZES&gt; --mod-codes &lt;MOD_CODES&gt; &lt;IN_BEDMETHYL&gt; &lt;OUT_FP&gt;
2830+
Usage: modkit bedmethyl tobigwig [OPTIONS] --mod-codes &lt;MOD_CODES&gt; &lt;--sizes &lt;CHROMSIZES&gt;|--header &lt;INPUT_BAM&gt;&gt; &lt;IN_BEDMETHYL&gt; &lt;OUT_FP&gt;
28312831

28322832
Arguments:
2833-
&lt;IN_BEDMETHYL&gt; Input bedmethyl, uncompressed, "-" or "stdin" indicates an
2834-
input stream
2835-
&lt;OUT_FP&gt; Output bigWig filename
2833+
&lt;IN_BEDMETHYL&gt;
2834+
Input bedmethyl, uncompressed, "-" or "stdin" indicates an input
2835+
stream
2836+
2837+
&lt;OUT_FP&gt;
2838+
Output bigWig filename
28362839

28372840
Options:
2838-
-g, --sizes &lt;CHROMSIZES&gt; A chromosome sizes file. Each line should be have
2839-
a chromosome and its size in bases, separated by
2840-
whitespace. A fasta index (.fai) works as well
2841-
-m, --mod-codes &lt;MOD_CODES&gt; Make a bigWig track where the values are the
2842-
percent of bases with this modification, use
2843-
multiple comma-separated codes to combine counts.
2844-
For example --mod-code m makes a track of the 5mC
2845-
percentages and --mod-codes h,m will make a track
2846-
of the combined counts from 5hmC and 5mC.
2847-
Combining counts for different primary bases will
2848-
cause an error (e.g. --mod-codes a,h)
2849-
-h, --help Print help
2841+
-g, --sizes &lt;CHROMSIZES&gt;
2842+
A chromosome sizes file. Each line should be a chromosome and its size
2843+
in bases, separated by whitespace. A fasta index (.fai) works as well.
2844+
Use instead of the bam header
2845+
2846+
-b, --header &lt;INPUT_BAM&gt;
2847+
modBAM from which the pileup was generated. Chromosome sizes are
2848+
gathered from the header. Use instead of the chromosome sizes file
2849+
2850+
-m, --mod-codes &lt;MOD_CODES&gt;
2851+
Make a bigWig track where the values are the percent of bases with
2852+
this modification, use multiple comma-separated codes to combine
2853+
counts. For example --mod-code m makes a track of the 5mC percentages
2854+
and --mod-codes h,m will make a track of the combined counts from 5hmC
2855+
and 5mC. Combining counts for different primary bases will cause an
2856+
error (e.g. --mod-codes a,h will error)
2857+
2858+
-h, --help
2859+
Print help (see a summary with '-h')
28502860

28512861
Output Options:
28522862
--negative-strand-values
28532863
Report the percentages on the negative strand as negative values. The
28542864
data range will be [-100, 100]
2865+
28552866
-z, --nzooms &lt;NZOOMS&gt;
2856-
Set the maximum of zooms to create [default: 10]
2867+
Set the maximum of zooms to create
2868+
2869+
[default: 10]
2870+
28572871
--zooms &lt;ZOOMS&gt;...
28582872
Set the zoom resolutions to use (overrides the --nzooms argument)
2873+
28592874
-u, --uncompressed
28602875
Don't use compression
2876+
28612877
--block-size &lt;BLOCK_SIZE&gt;
2862-
Number of items to bundle in r-tree [default: 256]
2878+
Number of items to bundle in r-tree
2879+
2880+
[default: 256]
2881+
28632882
--items-per-slot &lt;ITEMS_PER_SLOT&gt;
2864-
Number of data points bundled at lowest level [default: 1024]
2883+
Number of data points bundled at lowest level
2884+
2885+
[default: 1024]
28652886

28662887
Compute Options:
2867-
-t, --nthreads &lt;NTHREADS&gt; Set the number of threads to use. This tool will
2868-
typically use ~225% CPU on a HDD. SDDs may be
2869-
higher. (IO bound) [default: 6]
2870-
--inmemory Do not create temporary files for intermediate data
2888+
-t, --nthreads &lt;NTHREADS&gt;
2889+
Set the number of threads to use. This tool will typically use ~225%
2890+
CPU on a HDD. SDDs may be higher. (IO bound)
2891+
2892+
[default: 6]
2893+
2894+
--inmemory
2895+
Do not create temporary files for intermediate data
2896+
2897+
--force-chromosome-ordering
2898+
If input bedMethyl has sorting of the same scheme as `sort`, this
2899+
option may speed up conversion
28712900

28722901
Logging Options:
28732902
--log-filepath &lt;LOG_FILEPATH&gt;
28742903
Specify a file for debug logs to be written to, otherwise ignore them.
28752904
Setting a file is recommended. (alias: log)
2905+
28762906
--suppress-progress
28772907
Hide the progress bar
28782908
</code></pre>
@@ -2939,7 +2969,7 @@ <h2 id="modbam-check-tags"><a class="header" href="#modbam-check-tags">modbam ch
29392969
Check tags on non-primary alignments as well. Keep in mind this may
29402970
incur a double-counting of the read with its primary mapping
29412971

2942-
--only-mapped
2972+
--mapped-only
29432973
Only check alignments that are mapped
29442974

29452975
--region &lt;REGION&gt;

docs/filtering.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ <h1 id="partitioning-pass-and-fail-base-modification-calls"><a class="header" hr
183183
For example to set a threshold for cytosine modifications at 0.8 and adenine modifications at 0.9 provide <code>--filter-threshold C:0.8 --filter-threshold A:0.9</code>.
184184
Pass threshold values per base modification can also be specified.
185185
For example, to specify a threshold for canonical adenine at 0.8 and 6mA at 0.9 use <code>--filter-threshold A:0.8 --mod-thresholds a:0.9</code>.
186-
Or to specify a threshold of 0.8 for 5mC, 0.9 for 5hmC, and 0.85 for canonical cytosine: <code>--filter-threshold C:0.85 --mod-thresholds m:0.8 --mod-thresholds h:0.9</code></p>
186+
Or to specify a threshold of 0.8 for 5mC, 0.9 for 5hmC, and 0.85 for canonical cytosine: <code>--filter-threshold C:0.85 --mod-threshold m:0.8 --mod-threshold h:0.9</code></p>
187187
<p>Keep in mind that the <code>--mod-threshold</code> option will treat <code>A</code>, <code>C</code>, <code>G</code>, and <code>T</code> and "any-mod" as per the <a href="https://samtools.github.io/hts-specs/SAMtags.pdf">specification</a>.</p>
188188
<h2 id="further-details"><a class="header" href="#further-details">Further details</a></h2>
189189
<ol>

0 commit comments

Comments
 (0)