Releases: PacificBiosciences/trgt
TRGT v5.0.0
Summary of changes:
- Important changes:
- Extended support for phased BAM inputs (from tools such as HiPhase). TRGT can now output phased genotypes (e.g.
1|0) and phase sets using HP/PS tags, when sufficient haplotype support is available.- Note: the FORMAT field order remains unchanged after phasing, any downstream analysis that assumes positional indexing on genotypes must account for phasing.
- More robust and optimized VCF merging (
mergesubcommand).- Added a streaming merge mode which avoids loading tabix/CSI indices. This mode is especially useful for merging very large cohorts and can be enabled with the
--no-indexflag. It significantly reduces memory requirements but it requires that all inputs have the exact same contig ordering. - Compressed VCF/BCF output can now be automatically indexed after merging by setting the
--write-indexflag. - Added the
--threadsflag to set the number of (de)compression threads for input/output VCF files.
- Added a streaming merge mode which avoids loading tabix/CSI indices. This mode is especially useful for merging very large cohorts and can be enabled with the
- Extended support for phased BAM inputs (from tools such as HiPhase). TRGT can now output phased genotypes (e.g.
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binary is available below.
TRGT v4.1.0
Summary of changes:
- Important changes:
- Experimental remote file support for most TRGT inputs (BAM/CRAM, reference FASTA, repeat catalogs, and VCF/BCF). See remote file documentation for more details.
- Improved consensus generation for long repeat expansions, as part of an ongoing refinement effort.
- CLI: improved terminal color detection, and added a
--colorflag.
Remote file support
TRGT now includes experimental support for reading input files directly from remote locations. See remote file documentation for more details. This feature applies to most file inputs, such as the reference genome, reads (BAM/CRAM), repeat catalogs, and VCF/BCF files (except for the VCF files used for merging with the merge subcommand). You can now provide URLs with http://, https://, s3://, and gs:// schemes in addition to local file paths. TRGT attempts to perform checks and issue warnings for common configuration issues, such as missing cloud credentials or TLS/CA certificates. To simplify secure remote connections (e.g., via https://, s3://, gs://), TRGT automatically searches for a system-wide CA certificate bundle in common locations and configures its use. If you need to use a different CA bundle or wish to disable this auto-detection, you can either set the CURL_CA_BUNDLE environment variable to your preferred file or set the TRGT_DISABLE_CA_AUTODETECT environment variable.
Note: This feature is experimental. Accessing large files over a network may significantly increase runtime depending on network bandwidth and latency. Please report any issues you encounter.
Improved consensus generation for long repeat expansions
As part of an ongoing effort, we have improved the consensus sequence generation leading to better accuracy for large repeat expansions. Notable improvements were observed in loci such as: RFC1, FGF14, CNBP, DMPK, C9ORF72, and TCF4.
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binary is available below.
TRGT v4.0.0
Summary of changes:
- Important changes:
- Implemented read streaming to reduce I/O operations, achieving 2-4x speed improvements for WGS data (speed-up varies by catalog density). The input catalog must be sorted by contig and start to achieve high performance
- Added
trgt deepdivesubcommand for locus-specific downstream analysis. This command realigns reads to their consensus sequences and outputs FASTA (consensus sequences), BAM (realigned reads), and BED (annotation) files for detailed repeat analysis in tools like IGV. See deepdive documentation for more details
- Bug fix: Address non-deterministic behavior caused by inconsistent ordering of reads with identical start positions but different query names in sorted BAM files
- Bug fix: Fixed missing font in double arrow rendering
Performance improvements
In a benchmark run on sample NA12878 observed speed improvements ranging from 2-4x, depending on catalog density.
Runtime (solid) and peak‑RSS memory (dashed) for TRGT v3.0.0 versus v4.0.0 on NA12878 whole‑genome data (~100x). Each panel shows a different repeat catalog (≈900 k loci on the left, ≈8 M loci on the right); marker labels give v4/v3 speed‑up at the indicated thread count.
Deepdive subcommand
The trgt deepdive command enables detailed locus-specific analysis by realigning reads to consensus sequences. The example below shows allele-specific FMR1 methylation analysis in IGV, where the short allele is unmethylated while the expanded allele shows predominant methylation, a characteristic pattern in fragile X syndrome.
See the deepdive documentation for complete usage details.
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binary is available below.
TRGT v3.0.0
Summary of changes:
- Breaking change: Modified how TRGT detects repeat motifs
- Repeat segmentation algorithm only matches perfect STR motifs (imperfections in VNTR motifs are still allowed)
- More flexible motif matching that allows better detection and visualization of repeat interruptions
- Made a number of improvements to TRGT plots
- The size scale is replaced with the overall allele length
- More accurate motif matching in waterfall plots
- Similar alignments are grouped together
- The plots can be compressed vertically (useful for high-depth targeted data)
- Easier to distinguish between mismatches and unsegmented regions
- Bug fix: Explicitly set reference from genome path for CRAM input, avoiding issues with moved/renamed references (thanks to @Han-Cao for reporting and @ASLeonard for suggesting a solution)
Notes
This release introduces significant changes to the repeat segmentation algorithm. While the previous versions of TRGT allow imperfections in both STR (<=6bp) and VNTR (>6bp) motifs, the new algorithm requires all STR motif matches to be perfect (imperfections in VNTR motifs are still allowed). For example, the allele CAG⋅CAG⋅CAG⋅CAA⋅CAG⋅CAG⋅CAG would have been previously reported to contain 7 CAGs (6 perfect copies and 1 imperfect copy). Now it is reported to contain 6 CAGs.
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binary is available below.
TRGT v2.1.0
Summary of changes:
- Important changes:
- Extended the use of optimized aligners to the plotting (TRVZ) functionality
- Further optimized alignments, significantly improving overall performance for targeted datasets beyond the improvements introduced in 2.0, and cumulatively achieving ~2x speed-up for whole-genome sequencing data
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binaries are available below.
TRGT v2.0.0
Summary of changes:
- Important changes:
- Introduced fine-grained parallelization using work-stealing, significantly increasing CPU utilization, resulting in speed-ups of up to 200x for targeted datasets
- All previous aligners used for genotyping have been replaced with significantly faster and more memory-efficient alternatives (WFA2-lib)
- When using the targeted data preset, reads with higher repeat purity between flanks are now given priority
- TRGT now includes a default monospace font, ensuring consistent text rendering during plotting even on systems without fonts in standard locations (thanks to @Fu-Yilei for reporting)
- Added CLI option
--vcf_listto TRGT merge, allowing users to specify a file containing a list of VCF files. This option is mutually exclusive with the existing--vcfsflag - Introduced CLI option
--font-familyfor specifying custom font families when plotting - Bug fix: the VCF QUAL field is now always set with
.rather than0(thanks to @zeeev for reporting) - Bug fix: fixed handling of AL tags when loading spanning BAM files modified by samtools (thanks to @jrharting for reporting)
- Improved error handling and messaging during parsing of plotting inputs
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binaries are available below.
TRGT v1.5.1
Summary of changes:
- Fixed an issue that prevented extraction of CpG methylation from BAM records containing multiple base modifications.
Thanks to @jrharting for reporting it.
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binary is available below.
TRGT v1.5.0
Summary of change:
- Read clustering genotyper (
--genotyper cluster) is now significantly faster at genotyping high-coverage repeat expansions; this may result in minor changes to consensus sequence and read assignment for highly mosaic repeats
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binaries are available below.
TRGT v1.4.1
Summary of change:
- Bug fix: type correction of rq tag in BAM output
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binaries are available below.
TRGT v1.4.0
Summary of changes:
- Parameters appropriate for targeted sequencing can now be set with
--preset targetedoption - Waterfall plots no longer panic when there are no reads in a locus
- Algorithmic changes to
--genotyper clusterallow fewer reads to be assigned to an allele; this may result in minor changes to consensus sequence and read assignment
Contributors: Tom Mokveld, Guilherme De Sena Brandine, and Egor Dolzhenko
Note: TRGT is still in active development. Please file an issue or reach out by email with questions, bug reports, and feature suggestions.
Linux binaries are available below.
