@@ -64,6 +64,7 @@ Commands:
6464 localize Investigate patterns of base modifications, by aggregating
6565 pileup counts "localized" around genomic features of interest
6666 stats Calculate base modification levels over entire regions
67+ bedmethyl Utilities to work with bedMethyl files
6768 help Print this message or the help of the given subcommand(s)
6869
6970Options:
@@ -337,9 +338,6 @@ Arguments:
337338 one of `-` or `stdin` to specify a stream from standard output
338339
339340Options:
340- --log-filepath <LOG_FILEPATH>
341- Output debug logs to file at this path
342-
343341 --ignore <IGNORE>
344342 Modified base code to ignore/remove, see
345343 https://samtools.github.io/hts-specs/SAMtags.pdf for details on the
@@ -438,11 +436,31 @@ Options:
438436 when estimating the filter threshold (i.e. ignore soft-clipped, and
439437 inserted bases)
440438
439+ --motif <MOTIF> <MOTIF>
440+ Filter out any base modification call that isn't part of a basecall
441+ sequence motif. This argument can be passed multiple times. Format is
442+ <motif_sequence> <offset>. For example the argument to match CpG
443+ dinucleotides is `--motif CG 0`, or to match CG[5mC]G the argument
444+ would be `--motif CGCG 2`. Single bases can be used as motifs to keep
445+ only base modification calls for a specific primary base, for example
446+ `--motif C 0`
447+
448+ --cpg
449+ Shorthand for --motif CG 0
450+
451+ --discard-motifs
452+ Discard base modification calls that match the provided motifs
453+ (instead of keeping them)
454+
441455 --suppress-progress
442456 Hide the progress bar
443457
444458 -h, --help
445459 Print help (see a summary with '-h')
460+
461+ Logging:
462+ --log-filepath <LOG_FILEPATH>
463+ Output debug logs to file at this path
446464```
447465
448466## update-tags
@@ -851,6 +869,20 @@ Options:
851869 using this flag will keep only base modification calls in the first 4
852870 and last 8 bases
853871
872+ --motif <MOTIF> <MOTIF>
873+ Filter out any base modification call that isn't part of a basecall
874+ sequence motif This argument can be passed multiple times. Format is
875+ <motif_sequence> <offset>. For example the argument to match CpG
876+ dinucleotides is `--motif CG 0`, or to match CG[5mC]G the argument
877+ would be `--motif CGCG 2`
878+
879+ --cpg
880+ Shorthand for --motif CG 0
881+
882+ --discard-motifs
883+ Discard base modification calls that match the provided motifs
884+ (instead of keeping them)
885+
854886 --output-sam
855887 Output SAM format instead of BAM
856888
@@ -1263,7 +1295,10 @@ Options:
12631295 Respect soft masking in the reference FASTA
12641296
12651297 --motif <MOTIF> <MOTIF>
1266- Motif to use for entropy calculation, default will be CpG
1298+ Motif to use for entropy calculation, multiple motifs can be used by
1299+ repeating this option. When multiple motifs are used that specify
1300+ different modified primary bases, all modification possibilities will
1301+ be used in the calculation
12671302
12681303 --cpg
12691304 Use CpG motifs. Short hand for --motif CG 0 --combine-strands
@@ -2372,3 +2407,64 @@ Options:
23722407 -h, --help
23732408 Print help
23742409```
2410+
2411+ ## bedmethyl merge
2412+ ``` text
2413+ Perform an outer join on two or more bedMethyl files, summing their counts for
2414+ records that overlap
2415+
2416+ Usage: modkit bedmethyl merge [OPTIONS] --out-bed <OUT_BED> --genome-sizes <GENOME_SIZES> [IN_BEDMETHYL] [IN_BEDMETHYL]...
2417+
2418+ Arguments:
2419+ [IN_BEDMETHYL] [IN_BEDMETHYL]...
2420+ Input bedMethyl table(s). Should be bgzip-compressed and have an
2421+ associated Tabix index. The tabix index will be assumed to be
2422+ $this_file.tbi
2423+
2424+ Options:
2425+ -o, --out-bed <OUT_BED>
2426+ Specify the output file to write the results table
2427+
2428+ -g, --genome-sizes <GENOME_SIZES>
2429+ TSV of genome sizes, should be <chrom>\t<size_in_bp>
2430+
2431+ --force
2432+ Force overwrite the output file
2433+
2434+ --with-header
2435+ Output a header with the bedMethyl
2436+
2437+ --mixed-delim
2438+ Output bedMethyl where the delimiter of columns past column 10 are
2439+ space-delimited instead of tab-delimited. This option can be useful
2440+ for some browsers and parsers that don't expect the extra columns of
2441+ the bedMethyl format
2442+
2443+ --chunk-size <CHUNK_SIZE>
2444+ Chunk size for how many start..end regions for each chromosome to
2445+ read. Larger values will lead to faster merging at the expense of
2446+ memory usage, while smaller values will be slower with lower memory
2447+ usage. This option will only impact large bedmethyl files
2448+
2449+ -i, --interval-size <INTERVAL_SIZE>
2450+ Interval chunk size in base pairs to process concurrently. Smaller
2451+ interval chunk sizes will use less memory but incur more overhead
2452+
2453+ [default: 100000]
2454+
2455+ --log-filepath <LOG_FILEPATH>
2456+ Specify a file to write debug logs to
2457+
2458+ -t, --threads <THREADS>
2459+ Number of threads to use
2460+
2461+ [default: 4]
2462+
2463+ --io-threads <IO_THREADS>
2464+ Number of tabix/bgzf threads to use
2465+
2466+ [default: 2]
2467+
2468+ -h, --help
2469+ Print help (see a summary with '-h')
2470+ ```
0 commit comments