All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning.
- Intermediate FASTQ files are now bgzip compressed to reduce storage requirements (#189).
- Colons are now used instead of commas to separate SNP alleles in microhap alleles (#192).
- Implemented Python 3.12 support by integrating happer package and increasing minimum version of MicroHapDB dependency (#193).
- Updated working directory organization to provide additional structure (#194).
- Bug with handling marker vs. locus identifiers when running
mhpl8r seq(#190). - Bug with writing output to terminal for some commands (#191).
- Resolved a bug with the test suite (#187).
- Resolved a bug with Table 4.4 in the final report.
- Alignment gaps are now marked instead of ignored (#185).
- Tables were added to the final report to list markers with large numbers of discarded reads or reads containing gaps (#186).
- Updated the working directory structure so report is more easily shared (#181).
- Replaced BWA MEM with Minimap2 for read alignment (#182).
- Filtering of reads with too many ambiguous bases (#165).
- Filtering of reads below a minumum length (#173).
- Smart handling of multiple markers at a single locus implemented for
typeandpipe(#158). - Replaced
parsersmodule with aMicrohapIndexclass and supporting classes (#158). - Numerous updates to the
mhpl8r pipeHTML report- Replaced read length distubition histograms with ridge plots (#167).
- Replaced read QC donut plots with a stacked bar chart (#168, #177).
- Replaced FastQC report links with MultiQC link (#169).
- Revised report text, added figure captions and table titles (#172).
- Streamlined the Python code responsible for generating the report (#175, #177, #180).
- GRCh38 coordinates are now mandatory in marker definitions (#176).
- Jijna2 sub-templates added to handle differences in the report for single ended versus paired end reads (#179).
- Bug with inconsistent sorting of read counts for interlocus balance (#159).
- Bug with counting repetitive reads (#163).
- Typing rate display in report (#174).
- Resolved bug with lexicographical sorting of numeric data in
mhpl8r pipereport (#156).
- Updated MANIFEST.in, setup.py (6bf4533).
- Profiles compatible with probgen programs now included in
mhpl8r pipeoutput (#135). - Haplotype call plots now included in the
mhpl8r pipeHTML report (#136). - New
offtargetmodule to count reads that map to off-target loci in GRCh38 (#143, #153). - Added typing rate and mapping rate information per marker to the main
mhpl8r pipeHTML report (#146). - Added marker detail page to the
mhpl8r pipeHTML report (#146, #151). - Implemented support for single-end reads with
mhpl8r pipe(#147). - Added donut plots to HTML report (#157)
- Exposed static and dynamic threshold configuration to
mhpl8r pipeCLI (#135, #153). - Updated plot colors in the
mhpl8r pipeHTML report (#139). - Updated
mhpl8r pipeHTML report to conditionally plot read length histograms or tables depending on uniformity (#140).
- Bug with Snakefile not being included in package data.
- New API and CLI entry points for computing and visualizing heterozygote balance (#122, #131).
- New
typing_ratemethod for the TypingResult class (#127). - New API function for plotting distribution of read lengths (#128).
- New CLI entry point for downloading GRCh38 (#130).
- New end-to-end microhap analysis pipeline and report (#129, #132).
- Interlocus balance code updated to support generating high-resolution graphics and performing a chi-square goodness-of-fit test (#121, #131, #132).
- Bug with filtering/genotype calling for markers with no valid reads (#123).
- Set 0.01 as a more reasonable default frequency for rare alleles than 0.001 (#131, #132).
- New
--base-qualparameter formhpl8r typeto set the minimum required base quality when iterating over reads in a pileup (#83). - New
mhpl8r balancesubcommand for calculating and visualizing interlocus balance (#85). - Users can now supply marker definitions, frequences, and reference sequences as TSV/FASTA files instead of MicroHapDB references (#93).
- Configuration file examples in
microhapulator/data/configs/(#105). - New
mhpl8r filtersubcommand with support for marker-specific thresholds (#113, #114). - New
mhpl8r convertsubcommand for converting genotype calls into a format compatible with probgen tools such as LRMix Studio and EuroForMix (#115).
- Updated mybinder demo (see #69, #110, #113).
- Simulated Illumina sequencing now uses 1 thread by default, which paradoxically leads to better performance (#71).
- Moved panel definition code moved out of the core code and into dedicated notebooks (#74).
- Replaced
MissingBAMIndexErrorwith BAM auto-indexing code (#78). - Improved read names and choice of interleaved or paired output for
mhpl8r seq(#80). - Updated filtering of haplotype calls / typing results
- Replaced
--thresholdargument with--staticand--dynamic, disabled both by default (#82, #83). - Split
mhpl8r typesubcommand intotypeandfilter, with--staticand--dynamicarguments only relevant to the latter (#113).
- Replaced
- Changed the default pysam pileup
max_depthparameter, overriding 8000 with 1e6 and exposing as a CLI parameter (#87, #113). - Removed dependency on MicroHapDB for marker definitions, frequencies, and sequences (#93).
- Refactored CLI and Python API, adding new
microhapulator.apimodule to serve as main entry point (#98, c98bf6c78ef4). - Replaced the "ObservedProfile" terminology with the more appropriate "TypingResult" (#99).
- Documentation now uses Sphinx to render markdown as HTML (c98bf6c78e, #101, #102, #105, #106, #110).
- Updated JSON schema for simulated profiles and typing results (#109).
- Corrected a bug with Fastq headers in
mhpl8r seqmodule (#71). - Corrected a bug resulting from attempting to do set operations on
None(#75). - Corrected a bug with RMP implementation (#86).
- Set minimum versions for runtime dependencies (#97).
- Corrected the
--dynamicfilter to operate on total haplotype counts rather than average counts (#114).
Fix minor metadata typo
- New
containmodule for calculating the proportion of alleles from one sample present in another sample (see #41). - New
--dry-runmode forsimandmixturemodules (see #41). - New
probmodule for calculating likelihood ratio tests based on the random match probability (see #43). - New
seqmodule focused entirely on sequencing samples where profiles have already been simulated (see #45). - New
mixmodule for merging simulated profiles into a simulated mixture sample profile (see #45). - New
unitemodule for "mating" two profiles to create a simulated "offspring" profile (see #47). - New
diffmodfule for showing the differences between two profiles (see #58, #60).
- Huge refactoring effort to accommodate for recent changes to MicroHapDB's Python API (see #66).
- The
simmodule no longer performs simulated sequencing (now handled by newseqmodule) and instead focuses entirely on haplotype simulation (see #45). - The
typemodule now dynamically selects either an automatic threshold or a fixed threshold based on effective coverage (see #51, #61). - Moved simulation scripts to notebook directory, reimplemented as a Snakemake workflow (see #50, #57, #59).
- Corrected a bug in the USA panel code ensuring that all loci have allele frequency data for all relevant populations (see #56).
- Corrected a problem with the
contribAPI that prevented it from working directly onProfileobjects (see #70).
- Dropped the
mixturemodule, whose functionality is now covered by the more granularsim,mix, andseqmodules (see #45). - Dropped the
LocusContextclass in favor of MicroHapDB'sTargetAmpliconclass designed for a similar purpose (see #67).
- New interactive demo using Jupyter notebook and mybinder.org (see #32).
- Scripts for simulating 8k individuals with demographics roughly matching those of the U.S.A. (see #13).
- Updated
distandgenotypemodules to better support equality and distance comparisons between simulated and observed genotypes (see #34). - Simulated and observed genotype data now use a single unified JSON representation, enforced by JSON schema validation (see #38).
- New
getrefrmodule for installing human reference genome (see #30). - Improved error handling related to BAM index files (see #31).
- Replaced pip/PyPI installation instructions with bioconda installation instructions (see 1b61c59200).
- Corrected usage statement for
mhpl8r contrib(see #31).
- mark failing test
- exclude
__init__.pyfrom test invocation
Updating file manifest, troubleshooting bioconda build.
Minor fix to setup.py for distribution purposes.
Initial release of MicroHapulator!
Command-line entry point: mhpl8r
Command-line operations:
contrib: estimate number of contributors in a sampledist: compute naive Hamming distance between two sample genotypesmixture: simulate a multiple-contributor samplerefr: retrieve reference genome cutouts for a specified microhap panelsim: simulate a single-contributor sampletype: infer genotype from mapped reads