Skip to content

Commit 10d99bc

Browse files
committed
Merge branch 'ar/prepare-0-4-2-release' into 'master'
Prepare 0.4.2 release See merge request machine-learning/modkit!238
2 parents d62c99b + b6cec1a commit 10d99bc

40 files changed

+689
-103
lines changed

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,14 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [v0.4.2]
8+
### Adds
9+
- [entropy] Entropy can now be calculated with multiple motifs and multiple modified primary bases.
10+
- [adjust-mods, call-mods] Retain or remove base modification calls based on whether they match a sequence motif in the basecall sequence.
11+
- [bedmethyl] Add command to merge bedMethyl files.
12+
- [dmr] Add strand to DMR output.
13+
14+
715
## [v0.4.1]
816
### Adds
917
- [docs] Fix documentation links

book/src/advanced_usage.md

Lines changed: 100 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ Commands:
6464
localize Investigate patterns of base modifications, by aggregating
6565
pileup counts "localized" around genomic features of interest
6666
stats Calculate base modification levels over entire regions
67+
bedmethyl Utilities to work with bedMethyl files
6768
help Print this message or the help of the given subcommand(s)
6869
6970
Options:
@@ -337,9 +338,6 @@ Arguments:
337338
one of `-` or `stdin` to specify a stream from standard output
338339
339340
Options:
340-
--log-filepath <LOG_FILEPATH>
341-
Output debug logs to file at this path
342-
343341
--ignore <IGNORE>
344342
Modified base code to ignore/remove, see
345343
https://samtools.github.io/hts-specs/SAMtags.pdf for details on the
@@ -438,11 +436,31 @@ Options:
438436
when estimating the filter threshold (i.e. ignore soft-clipped, and
439437
inserted bases)
440438
439+
--motif <MOTIF> <MOTIF>
440+
Filter out any base modification call that isn't part of a basecall
441+
sequence motif. This argument can be passed multiple times. Format is
442+
<motif_sequence> <offset>. For example the argument to match CpG
443+
dinucleotides is `--motif CG 0`, or to match CG[5mC]G the argument
444+
would be `--motif CGCG 2`. Single bases can be used as motifs to keep
445+
only base modification calls for a specific primary base, for example
446+
`--motif C 0`
447+
448+
--cpg
449+
Shorthand for --motif CG 0
450+
451+
--discard-motifs
452+
Discard base modification calls that match the provided motifs
453+
(instead of keeping them)
454+
441455
--suppress-progress
442456
Hide the progress bar
443457
444458
-h, --help
445459
Print help (see a summary with '-h')
460+
461+
Logging:
462+
--log-filepath <LOG_FILEPATH>
463+
Output debug logs to file at this path
446464
```
447465

448466
## update-tags
@@ -851,6 +869,20 @@ Options:
851869
using this flag will keep only base modification calls in the first 4
852870
and last 8 bases
853871
872+
--motif <MOTIF> <MOTIF>
873+
Filter out any base modification call that isn't part of a basecall
874+
sequence motif This argument can be passed multiple times. Format is
875+
<motif_sequence> <offset>. For example the argument to match CpG
876+
dinucleotides is `--motif CG 0`, or to match CG[5mC]G the argument
877+
would be `--motif CGCG 2`
878+
879+
--cpg
880+
Shorthand for --motif CG 0
881+
882+
--discard-motifs
883+
Discard base modification calls that match the provided motifs
884+
(instead of keeping them)
885+
854886
--output-sam
855887
Output SAM format instead of BAM
856888
@@ -1263,7 +1295,10 @@ Options:
12631295
Respect soft masking in the reference FASTA
12641296
12651297
--motif <MOTIF> <MOTIF>
1266-
Motif to use for entropy calculation, default will be CpG
1298+
Motif to use for entropy calculation, multiple motifs can be used by
1299+
repeating this option. When multiple motifs are used that specify
1300+
different modified primary bases, all modification possibilities will
1301+
be used in the calculation
12671302
12681303
--cpg
12691304
Use CpG motifs. Short hand for --motif CG 0 --combine-strands
@@ -2372,3 +2407,64 @@ Options:
23722407
-h, --help
23732408
Print help
23742409
```
2410+
2411+
## bedmethyl merge
2412+
```text
2413+
Perform an outer join on two or more bedMethyl files, summing their counts for
2414+
records that overlap
2415+
2416+
Usage: modkit bedmethyl merge [OPTIONS] --out-bed <OUT_BED> --genome-sizes <GENOME_SIZES> [IN_BEDMETHYL] [IN_BEDMETHYL]...
2417+
2418+
Arguments:
2419+
[IN_BEDMETHYL] [IN_BEDMETHYL]...
2420+
Input bedMethyl table(s). Should be bgzip-compressed and have an
2421+
associated Tabix index. The tabix index will be assumed to be
2422+
$this_file.tbi
2423+
2424+
Options:
2425+
-o, --out-bed <OUT_BED>
2426+
Specify the output file to write the results table
2427+
2428+
-g, --genome-sizes <GENOME_SIZES>
2429+
TSV of genome sizes, should be <chrom>\t<size_in_bp>
2430+
2431+
--force
2432+
Force overwrite the output file
2433+
2434+
--with-header
2435+
Output a header with the bedMethyl
2436+
2437+
--mixed-delim
2438+
Output bedMethyl where the delimiter of columns past column 10 are
2439+
space-delimited instead of tab-delimited. This option can be useful
2440+
for some browsers and parsers that don't expect the extra columns of
2441+
the bedMethyl format
2442+
2443+
--chunk-size <CHUNK_SIZE>
2444+
Chunk size for how many start..end regions for each chromosome to
2445+
read. Larger values will lead to faster merging at the expense of
2446+
memory usage, while smaller values will be slower with lower memory
2447+
usage. This option will only impact large bedmethyl files
2448+
2449+
-i, --interval-size <INTERVAL_SIZE>
2450+
Interval chunk size in base pairs to process concurrently. Smaller
2451+
interval chunk sizes will use less memory but incur more overhead
2452+
2453+
[default: 100000]
2454+
2455+
--log-filepath <LOG_FILEPATH>
2456+
Specify a file to write debug logs to
2457+
2458+
-t, --threads <THREADS>
2459+
Number of threads to use
2460+
2461+
[default: 4]
2462+
2463+
--io-threads <IO_THREADS>
2464+
Number of tabix/bgzf threads to use
2465+
2466+
[default: 2]
2467+
2468+
-h, --help
2469+
Print help (see a summary with '-h')
2470+
```

docs/404.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@
9292

9393
<nav id="sidebar" class="sidebar" aria-label="Table of contents">
9494
<div class="sidebar-scrollbox">
95-
<ol class="chapter"><li class="chapter-item expanded "><a href="quick_start.html"><strong aria-hidden="true">1.</strong> Quick Start guides</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_bedmethyl.html"><strong aria-hidden="true">1.1.</strong> Constructing bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_pileup_hemi.html"><strong aria-hidden="true">1.2.</strong> Make hemi-methylation bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_adjust.html"><strong aria-hidden="true">1.3.</strong> Updating and adjusting MM tags</a></li><li class="chapter-item expanded "><a href="intro_sample_probs.html"><strong aria-hidden="true">1.4.</strong> Inspecting base modification probabilities</a></li><li class="chapter-item expanded "><a href="intro_summary.html"><strong aria-hidden="true">1.5.</strong> Summarizing a modBAM</a></li><li class="chapter-item expanded "><a href="intro_stats.html"><strong aria-hidden="true">1.6.</strong> Calculating modification statistics in regions</a></li><li class="chapter-item expanded "><a href="intro_call_mods.html"><strong aria-hidden="true">1.7.</strong> Calling mods in a modBAM</a></li><li class="chapter-item expanded "><a href="intro_edge_filter.html"><strong aria-hidden="true">1.8.</strong> Removing modification calls at the ends of reads</a></li><li class="chapter-item expanded "><a href="intro_repair.html"><strong aria-hidden="true">1.9.</strong> Repair MM/ML tags on trimmed reads</a></li><li class="chapter-item expanded "><a href="intro_motif.html"><strong aria-hidden="true">1.10.</strong> Working with sequence motifs</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_motif_bed.html"><strong aria-hidden="true">1.10.1.</strong> Making a motif BED file</a></li><li class="chapter-item expanded "><a href="intro_find_motifs.html"><strong aria-hidden="true">1.10.2.</strong> Find highly modified motif sequences</a></li><li class="chapter-item expanded "><a href="evaluate_motif.html"><strong aria-hidden="true">1.10.3.</strong> Evaluate and refine a table of known motifs</a></li></ol></li><li class="chapter-item expanded "><a href="intro_extract.html"><strong aria-hidden="true">1.11.</strong> Extracting read information to a table</a></li><li class="chapter-item expanded "><a href="intro_localize.html"><strong aria-hidden="true">1.12.</strong> Investigating patterns with localise</a></li><li class="chapter-item expanded "><a href="intro_dmr.html"><strong aria-hidden="true">1.13.</strong> Perform differential methylation scoring</a></li><li class="chapter-item expanded "><a href="intro_validate.html"><strong aria-hidden="true">1.14.</strong> Validate ground truth results</a></li><li class="chapter-item expanded "><a href="intro_entropy.html"><strong aria-hidden="true">1.15.</strong> Calculating methylation entropy</a></li><li class="chapter-item expanded "><a href="intro_include_bed.html"><strong aria-hidden="true">1.16.</strong> Narrow output to specific positions</a></li></ol></li><li class="chapter-item expanded "><a href="advanced_usage.html"><strong aria-hidden="true">2.</strong> Extended subcommand help</a></li><li class="chapter-item expanded "><a href="troubleshooting.html"><strong aria-hidden="true">3.</strong> Troubleshooting</a></li><li class="chapter-item expanded "><a href="faq.html"><strong aria-hidden="true">4.</strong> Frequently asked questions</a></li><li class="chapter-item expanded "><a href="limitations.html"><strong aria-hidden="true">5.</strong> Current limitations</a></li><li class="chapter-item expanded "><a href="perf_considerations.html"><strong aria-hidden="true">6.</strong> Performance considerations</a></li><li class="chapter-item expanded "><a href="algo_details.html"><strong aria-hidden="true">7.</strong> Algorithm details</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering.html"><strong aria-hidden="true">7.1.</strong> Pass/fail base modification calls</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering_details.html"><strong aria-hidden="true">7.1.1.</strong> Threshold examples</a></li><li class="chapter-item expanded "><a href="filtering_numeric_details.html"><strong aria-hidden="true">7.1.2.</strong> Numeric details</a></li></ol></li><li class="chapter-item expanded "><a href="dmr_scoring_details.html"><strong aria-hidden="true">7.2.</strong> DMR model and scoring details</a></li><li class="chapter-item expanded "><a href="collapse.html"><strong aria-hidden="true">7.3.</strong> Ignoring and combining calls</a></li></ol></li></ol>
95+
<ol class="chapter"><li class="chapter-item expanded "><a href="quick_start.html"><strong aria-hidden="true">1.</strong> Quick Start guides</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_pileup.html"><strong aria-hidden="true">1.1.</strong> Constructing bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_pileup_hemi.html"><strong aria-hidden="true">1.2.</strong> Make hemi-methylation bedMethyl tables</a></li><li class="chapter-item expanded "><a href="intro_adjust.html"><strong aria-hidden="true">1.3.</strong> Updating and adjusting MM tags</a></li><li class="chapter-item expanded "><a href="intro_sample_probs.html"><strong aria-hidden="true">1.4.</strong> Inspecting base modification probabilities</a></li><li class="chapter-item expanded "><a href="intro_summary.html"><strong aria-hidden="true">1.5.</strong> Summarizing a modBAM</a></li><li class="chapter-item expanded "><a href="intro_stats.html"><strong aria-hidden="true">1.6.</strong> Calculating modification statistics in regions</a></li><li class="chapter-item expanded "><a href="intro_call_mods.html"><strong aria-hidden="true">1.7.</strong> Calling mods in a modBAM</a></li><li class="chapter-item expanded "><a href="intro_edge_filter.html"><strong aria-hidden="true">1.8.</strong> Removing modification calls at the ends of reads</a></li><li class="chapter-item expanded "><a href="intro_repair.html"><strong aria-hidden="true">1.9.</strong> Repair MM/ML tags on trimmed reads</a></li><li class="chapter-item expanded "><a href="intro_motif.html"><strong aria-hidden="true">1.10.</strong> Working with sequence motifs</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="intro_motif_bed.html"><strong aria-hidden="true">1.10.1.</strong> Making a motif BED file</a></li><li class="chapter-item expanded "><a href="intro_find_motifs.html"><strong aria-hidden="true">1.10.2.</strong> Find highly modified motif sequences</a></li><li class="chapter-item expanded "><a href="evaluate_motif.html"><strong aria-hidden="true">1.10.3.</strong> Evaluate and refine a table of known motifs</a></li></ol></li><li class="chapter-item expanded "><a href="intro_extract.html"><strong aria-hidden="true">1.11.</strong> Extracting read information to a table</a></li><li class="chapter-item expanded "><a href="intro_localize.html"><strong aria-hidden="true">1.12.</strong> Investigating patterns with localise</a></li><li class="chapter-item expanded "><a href="intro_dmr.html"><strong aria-hidden="true">1.13.</strong> Perform differential methylation scoring</a></li><li class="chapter-item expanded "><a href="intro_validate.html"><strong aria-hidden="true">1.14.</strong> Validate ground truth results</a></li><li class="chapter-item expanded "><a href="intro_entropy.html"><strong aria-hidden="true">1.15.</strong> Calculating methylation entropy</a></li><li class="chapter-item expanded "><a href="intro_include_bed.html"><strong aria-hidden="true">1.16.</strong> Narrow output to specific positions</a></li><li class="chapter-item expanded "><a href="intro_bedmethyl_merge.html"><strong aria-hidden="true">1.17.</strong> Merge multiple bedMethyl files</a></li></ol></li><li class="chapter-item expanded "><a href="advanced_usage.html"><strong aria-hidden="true">2.</strong> Extended subcommand help</a></li><li class="chapter-item expanded "><a href="troubleshooting.html"><strong aria-hidden="true">3.</strong> Troubleshooting</a></li><li class="chapter-item expanded "><a href="faq.html"><strong aria-hidden="true">4.</strong> Frequently asked questions</a></li><li class="chapter-item expanded "><a href="limitations.html"><strong aria-hidden="true">5.</strong> Current limitations</a></li><li class="chapter-item expanded "><a href="perf_considerations.html"><strong aria-hidden="true">6.</strong> Performance considerations</a></li><li class="chapter-item expanded "><a href="algo_details.html"><strong aria-hidden="true">7.</strong> Algorithm details</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering.html"><strong aria-hidden="true">7.1.</strong> Pass/fail base modification calls</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="filtering_details.html"><strong aria-hidden="true">7.1.1.</strong> Threshold examples</a></li><li class="chapter-item expanded "><a href="filtering_numeric_details.html"><strong aria-hidden="true">7.1.2.</strong> Numeric details</a></li></ol></li><li class="chapter-item expanded "><a href="dmr_scoring_details.html"><strong aria-hidden="true">7.2.</strong> DMR model and scoring details</a></li><li class="chapter-item expanded "><a href="collapse.html"><strong aria-hidden="true">7.3.</strong> Ignoring and combining calls</a></li></ol></li></ol>
9696
</div>
9797
<div id="sidebar-resize-handle" class="sidebar-resize-handle">
9898
<div class="sidebar-resize-indicator"></div>

0 commit comments

Comments
 (0)