Skip to content

Commit 859ffc4

Browse files
committed
Merge branch 'release/v5.4.0'
2 parents 9964d5b + d488988 commit 859ffc4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+23568
-257
lines changed

.github/workflows/func_tests.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,21 @@ jobs:
1515
uses: actions/setup-python@v4
1616
with:
1717
python-version: 3.10.4
18+
- name: Upgrade pip
19+
run: |
20+
curl -sS https://bootstrap.pypa.io/get-pip.py -o get-pip.py
21+
python3 get-pip.py
22+
python3 -m pip install --upgrade pip
1823
- name: Cache python
1924
uses: actions/cache@v3
2025
with:
2126
path: ${{ env.pythonLocation }}
2227
key: ${{ env.pythonLocation }}-${{ hashFiles('setup.py') }}
2328
- name: Install dependencies
2429
run: |
25-
python3 -m pip install --upgrade pip setuptools
30+
python3 -m pip install --upgrade setuptools
2631
python3 -m pip install Cython pylint anybadge coverage
27-
python3 -m pip install .
32+
python3 -m pip install ./[bwa]
2833
- name: Running ssshtest
2934
run: |
3035
TMPDIR=`pwd` bash repo_utils/truvari_ssshtests.sh

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ RUN wget https://mafft.cbrc.jp/alignment/software/mafft_7.505-1_amd64.deb \
2020
RUN python3 -m pip install --upgrade pip && \
2121
python3 -m pip install setproctitle pylint anybadge coverage && \
2222
python3 -m pip install --upgrade setuptools && \
23-
python3 -m pip install ./
23+
python3 -m pip install ./[bwa]
2424

2525
WORKDIR /data
2626

docs/Home.md

Lines changed: 20 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,34 +3,28 @@ The wiki holds documentation most relevant for develop. For information on a spe
33
Citation:
44
English, A.C., Menon, V.K., Gibbs, R.A. et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 23, 271 (2022). https://doi.org/10.1186/s13059-022-02840-6
55

6-
# Before you start
7-
VCFs aren't always created with a strong adherence to the format's specification.
8-
9-
Truvari expects input VCFs to be valid so that it will only output valid VCFs.
10-
11-
We've developed a separate tool that runs multiple validation programs and standard VCF parsing libraries in order to validate a VCF.
12-
13-
Run [this program](https://github.com/acenglish/usable_vcf) over any VCFs that are giving Truvari trouble.
14-
15-
Furthermore, Truvari expects 'resolved' SVs (e.g. DEL/INS) and will not interpret BND signals across SVTYPEs (e.g. combining two BND lines to match a DEL call). A brief description of Truvari bench methodology is linked below.
16-
17-
Finally, Truvari does not handle multi-allelic VCF entries and as of v4.0 will throw an error if multi-allelics are encountered. Please use `bcftools norm` to split multi-allelic entries.
18-
196
# Index
207

218
- [[Updates|Updates]]
229
- [[Installation|Installation]]
23-
- Truvari Commands:
24-
- [[anno|anno]]
25-
- [[bench|bench]]
26-
- [[collapse|collapse]]
27-
- [[consistency|consistency]]
28-
- [[divide|divide]]
29-
- [[ga4gh|ga4gh]]
30-
- [[phab|phab]]
31-
- [[refine|refine]]
32-
- [[segment|segment]]
33-
- [[stratify|stratify]]
34-
- [[vcf2df|vcf2df]]
3510
- [[Development|Development]]
36-
- [[Citations|Citations]]
11+
12+
Truvari Commands:
13+
- Benchmarking
14+
- [[bench|bench]] - Performance metrics from comparison of two VCFs
15+
- [[refine|refine]] - Automated bench result refinement with phab
16+
- Merging
17+
- [[collapse|collapse]] Collapse redundant VCF entries
18+
- [[phab|phab]] Variant harmonization using MSA
19+
- Analysis
20+
- [[consistency|consistency]] Consistency report between multiple VCFs
21+
- [[stratify|stratify]] Count VCF entries inside BED regions
22+
- [[vcf2df|vcf2df]] Turn VCF into pandas DataFrame
23+
- [[stratp|Stratp-Test]] Stratification performance test
24+
- Annotation
25+
- [[anno|anno]] VCF Annotations
26+
- Misc
27+
- [[segment|segment]] Normalization of SVs into disjointed genomic regions
28+
- [[divide|divide]] Divide a VCF into independent shards
29+
- [[ga4gh|ga4gh]] Convert Truvari result to GA4GH
30+

docs/Installation.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ python3 -m pip install truvari==3.2.0
1010
```
1111
See [pypi](https://pypi.org/project/Truvari/#history) for a history of all distributed releases.
1212

13+
When using some annotations (e.g. `truvari anno remap`) The bwapy needs to be available. This can be installed via:
14+
```
15+
python3 -m pip install truvari[bwa]
16+
```
17+
1318
Manual Installation
1419
===================
1520
To build Truvari directly, clone the repository and switch to a specific tag.

docs/Updates.md

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
1-
# Truvari 5.3
2-
*in progress*
1+
# Truvari 5.4.0
2+
*October 7, 2025*
3+
4+
* New `stratp` command to automatically generate benchmark performance evaluation across stratifications
5+
* `truvari.VariantRecord.allele_freq_annos` now stores the results to speed up reuse in e.g. `collapse`.
6+
* LazyImporting for faster startup times
7+
* `collapse` now allows `--sizemax -1` to work with all large SVs easily.
8+
* New `collapse` argument `--fast-cluster` will dramatically speed up runtime when collapsing large (>100kbp) SVs
9+
* bwapy, which is a bother to install on macs, is now optional by default (#295)
10+
* `vcf2df --parquet` will write a parquet file, which is more stable across environments than the default joblib file.
11+
* Miscellaneous bug fixes (#288, #286, #284, #282, #275)
12+
13+
# Truvari 5.3.0
14+
*April 21, 2025*
315

416
* Fixed FP BNDs being dropped [details](https://github.com/ACEnglish/truvari/discussions/263).
517
* Restore default `--sizemax` - Some callers make SVs that span the entire chromosome, which disrupts truvari's chunking strategy

docs/bench.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,8 @@ This VCF makes different results depending on the `--pick` parameter
242242
| ac | TP | TP | FP |
243243
| multi | TP | TP | TP |
244244

245+
Note that multi-matching should be used with care. By allowing SVs to match multiple times, performance metrics become inflated in a way that’s misleading. Recall can exceed the number of calls made. Precision can be skewed if one baseline event explains many false calls. For example, if a single comparison SV matches to two baseline SVs, the caller only made one prediction, yet it’s getting credit for finding two events, thus inflating recall.
246+
245247
--dup-to-ins
246248
============
247249

imgs/coverage.svg

Lines changed: 2 additions & 2 deletions
Loading

pyproject.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ license = { text = "MIT" }
1313
dynamic = ["version"]
1414
requires-python = ">=3.8"
1515
dependencies = [
16-
"bwapy>=0.1.4",
1716
"edlib>=1.3.9",
1817
"intervaltree>=3.1",
1918
"joblib>=1.2.0",
@@ -27,6 +26,9 @@ dependencies = [
2726
"pywfa>=0.5.1",
2827
]
2928

29+
[project.optional-dependencies]
30+
bwa = ["bwapy>=0.1.4"]
31+
3032
[project.scripts]
3133
truvari = "truvari.__main__:main"
3234

repo_utils/answer_key/collapse/inputintragt_collapsed.vcf

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1577,13 +1577,13 @@ chr18 74756025 pbsv.INS.1213 C CCTCCCTCCCTTTCTTTCTTTTT 2 PASS IMPRECISE;SVTYPE=I
15771577
chr18 74766009 pbsv.DEL.1214 CTGTGTGTGTGTGTGTGTGTG C 2 PASS SVTYPE=DEL;SVLEN=-20;SVANN=TANDEM;AC=2 GT:AD:DP:SAC:SUPP 1/1:0,11:11:0,0,3,8:4
15781578
chr18 75067409 chr18-75067410-DEL-86;pbsv.DEL.1215 CGCTGGAAGCTCCACTGCCCTTTACAAGGTTCTATGAGCGCGGGGCTGGAAGCTCCACTGCCCTTTACAAGGTTCTATGAGCGCGGG C 4 PASS ID=chr18-75067410-DEL-86;SVTYPE=DEL;SVLEN=-86;TIG_REGION=h1tg006280l:236990-236990,h2tg003451l:128712-128712;QUERY_STRAND=+,-;HOM_REF=0,123;HOM_TIG=0,123;SVANN=TANDEM;AC=3;NumCollapsed=1;NumConsolidated=1;CollapseId=508.0 GT:AD:DP:SAC:DR:DV:GQ:SUPP 1/1:7,1:8:6,1,1,0:.:8:22:7
15791579
chr18 75068024 chr18-75068025-DEL-50 AGAAGATGGCTAAAAGTACGCACAGGGAAGGGGAGCAGGCACTGGTGGATG A 4 . ID=chr18-75068025-DEL-50;SVTYPE=DEL;SVLEN=-50;TIG_REGION=h1tg006280l:237519-237519,h2tg003451l:128183-128183;QUERY_STRAND=+,-;HOM_REF=0,15;HOM_TIG=0,15;AC=2;NumCollapsed=1;NumConsolidated=1;CollapseId=509.0 GT:DR:DV:GQ:SUPP 1/1:.:7:19:3
1580-
chr18 75105022 chr18-75105023-DEL-78;pbsv.DEL.1216 CGTGGAAGCTTTGCTGAAATGTCCTGCTTGTGTTTTACTCCGTGGCGAGCACAGCGTGCAGGTGCTCCGTGGAAGCTCA C 4 PASS ID=chr18-75105023-DEL-78;SVTYPE=DEL;SVLEN=-78;TIG_REGION=h2tg003451l:91227-91227;QUERY_STRAND=-;HOM_REF=0,10;HOM_TIG=0,10;SVANN=TANDEM;AC=2;NumCollapsed=2;NumConsolidated=1;CollapseId=510.0 GT:AD:DP:SAC:DR:DV:GQ:SUPP 0/1:6,5:11:2,4,1,4:8:3:7:7
1580+
chr18 75105022 chr18-75105023-DEL-78;pbsv.DEL.1216 CGTGGAAGCTTTGCTGAAATGTCCTGCTTGTGTTTTACTCCGTGGCGAGCACAGCGTGCAGGTGCTCCGTGGAAGCTCA C 4 PASS ID=chr18-75105023-DEL-78;SVTYPE=DEL;SVLEN=-78;TIG_REGION=h2tg003451l:91227-91227;QUERY_STRAND=-;HOM_REF=0,10;HOM_TIG=0,10;SVANN=TANDEM;AC=2;NumCollapsed=2;NumConsolidated=1;CollapseId=510.0 GT:AD:DP:SAC:DR:DV:GQ:SUPP 0/1:6,5:11:2,4,1,4:9:2:5:5
15811581
chr18 75108304 pbsv.DEL.1217 GCCTCACTCCCGCCAGCCCAGCAC G 2 PASS SVTYPE=DEL;SVLEN=-23;SVANN=TANDEM;AC=1 GT:AD:DP:SAC:SUPP 0/1:5,4:9:1,4,2,2:4
15821582
chr18 75132968 pbsv.INS.DUP.1218 A AGATCCCCGCGGCTCAGTGG 0 PASS SVTYPE=INS;SVLEN=19;AC=2 GT:AD:DP:SAC:SUPP 1/1:0,9:9:0,0,0,9:4
15831583
chr18 75246403 pbsv.INS.1219 G GGTGTGTGTGTGTGTGTGTGT 2 PASS SVTYPE=INS;SVLEN=20;AC=1 GT:AD:DP:SAC:SUPP 0/1:4,5:9:2,2,2,3:4
15841584
chr18 75293906 pbsv.INS.1220 A AACTGAGACACTGGCTCCTCTGTGGGTGTGGAGAAAGAACTGG 2 PASS SVTYPE=INS;SVLEN=42;AC=2 GT:AD:DP:SAC:SUPP 1/1:0,9:9:0,0,4,5:4
1585-
chr18 75299569 chr18-75299570-DEL-132;pbsv.DEL.1221 CCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCG C 4 PASS ID=chr18-75299570-DEL-132;SVTYPE=DEL;SVLEN=-132;TIG_REGION=h1tg006280l:469108-469108,h2tg015582l:3860-3860;QUERY_STRAND=+,-;HOM_REF=0,288;HOM_TIG=0,288;SVANN=TANDEM;AC=4;NumCollapsed=1;NumConsolidated=1;CollapseId=511.0 GT:AD:DP:SAC:DR:DV:GQ:SUPP 1/1:0,7:7:0,0,4,3:2:5:6:7
1586-
chr18 75299831 Sniffles2.DEL.36CS11 CCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCC C 0 PASS PRECISE;SVTYPE=DEL;SVLEN=-131;SUPPORT=2;COVERAGE=7,7,7,7,7;STRAND=+-;AF=0.286;STDEV_LEN=1.414;STDEV_POS=21.92;AC=1 GT:GQ:DR:DV:SUPP 0/1:6:5:2:2
1585+
chr18 75299569 chr18-75299570-DEL-132;pbsv.DEL.1221 CCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCG C 4 PASS ID=chr18-75299570-DEL-132;SVTYPE=DEL;SVLEN=-132;TIG_REGION=h1tg006280l:469108-469108,h2tg015582l:3860-3860;QUERY_STRAND=+,-;HOM_REF=0,288;HOM_TIG=0,288;SVANN=TANDEM;AC=4;NumCollapsed=1;NumConsolidated=1;CollapseId=511.0 GT:AD:DP:SAC:DR:DV:GQ:SUPP 1/1:0,7:7:0,0,4,3:5:2:6:7
1586+
chr18 75299570 Sniffles2.DEL.36BS11 CAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGC C 0 PASS PRECISE;SVTYPE=DEL;SVLEN=-132;SUPPORT=5;COVERAGE=7,7,7,7,7;STRAND=+-;AF=0.714;STDEV_LEN=0;STDEV_POS=0;AC=1 GT:GQ:DR:DV:SUPP 0/1:6:2:5:2
15871587
chr18 75383945 pbsv.DEL.1222 GGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA G 2 PASS SVTYPE=DEL;SVLEN=-30;SVANN=TANDEM;AC=2 GT:AD:DP:SAC:SUPP 1/1:0,9:9:0,0,2,7:4
15881588
chr18 75461601 pbsv.INS.1223 T TTCCCTCCCTCCCTCCCTCCC 2 PASS SVTYPE=INS;SVLEN=20;AC=2 GT:AD:DP:SAC:SUPP 1/1:0,6:6:0,0,2,4:4
15891589
chr18 75526339 pbsv.INS.1224 T TCACTGCCGTAACAAATGGGG 2 PASS SVTYPE=INS;SVLEN=20;SVANN=TANDEM;AC=1 GT:AD:DP:SAC:SUPP 0/1:1,3:4:0,1,3,0:4

repo_utils/answer_key/collapse/inputintragt_removed.vcf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -597,7 +597,7 @@ chr18 75067410 Sniffles2.DEL.367S11 GCTGGAAGCTCCACTGCCCTTTACAAGGTTCTATGAGCGCGGGG
597597
chr18 75068025 Sniffles2.DEL.368S11 GAAGATGGCTAAAAGTACGCACAGGGAAGGGGAGCAGGCACTGGTGGATGG G 0 PASS PRECISE;SVTYPE=DEL;SVLEN=-50;SUPPORT=7;COVERAGE=8,7,7,7,7;STRAND=+-;AF=1;STDEV_LEN=0;STDEV_POS=0;AC=2;PctSeqSimilarity=0.9902;PctSizeSimilarity=1;PctRecOverlap=0.9804;SizeDiff=0;StartDistance=-1;EndDistance=-1;TruScore=99;MatchId=509.0 GT:GQ:DR:DV ./.:.:.:. 1/1:19:0:7 ./.:.:.:.
598598
chr18 75104861 Sniffles2.DEL.369S11 GAAGCTCCGTGGAAGCTTGCTGAAATGTCCTGCTTGTGTTTTACTCCGTGGCGAGCACAGCGTGCAGGCGCTCCGTG G 0 GT PRECISE;SVTYPE=DEL;SVLEN=-76;SUPPORT=2;COVERAGE=9,11,11,11,11;STRAND=-;AF=0.182;STDEV_LEN=0.707;STDEV_POS=4.95;AC=0;PctSeqSimilarity=0.9808;PctSizeSimilarity=0.9744;PctRecOverlap=0;SizeDiff=2;StartDistance=161;EndDistance=163;TruScore=65;MatchId=510.0 GT:GQ:DR:DV ./.:.:.:. 0/0:5:9:2 ./.:.:.:.
599599
chr18 75105023 Sniffles2.DEL.36AS11 GTGGAAGCTTTGCTGAAATGTCCTGCTTGTGTTTTACTCCGTGGCGAGCACAGCGTGCAGGTGCTCCGTGGAAGCTCAG G 0 PASS PRECISE;SVTYPE=DEL;SVLEN=-78;SUPPORT=3;COVERAGE=10,11,11,11,11;STRAND=+-;AF=0.273;STDEV_LEN=0;STDEV_POS=0;AC=1;PctSeqSimilarity=0.9937;PctSizeSimilarity=1;PctRecOverlap=0.9873;SizeDiff=0;StartDistance=-1;EndDistance=-1;TruScore=99;MatchId=510.0 GT:GQ:DR:DV ./.:.:.:. 0/1:7:8:3 ./.:.:.:.
600-
chr18 75299570 Sniffles2.DEL.36BS11 CAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGC C 0 PASS PRECISE;SVTYPE=DEL;SVLEN=-132;SUPPORT=5;COVERAGE=7,7,7,7,7;STRAND=+-;AF=0.714;STDEV_LEN=0;STDEV_POS=0;AC=1;PctSeqSimilarity=1;PctSizeSimilarity=1;PctRecOverlap=0.9925;SizeDiff=0;StartDistance=-1;EndDistance=-1;TruScore=99;MatchId=511.0 GT:GQ:DR:DV ./.:.:.:. 0/1:6:2:5 ./.:.:.:.
600+
chr18 75299831 Sniffles2.DEL.36CS11 CCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCCCCGCAGTGGACGGTGATCCATCGTTAAAGGACATGGTGAGCTTGCACCAGGCACTAGATAGCTGCC C 0 PASS PRECISE;SVTYPE=DEL;SVLEN=-131;SUPPORT=2;COVERAGE=7,7,7,7,7;STRAND=+-;AF=0.286;STDEV_LEN=1.414;STDEV_POS=21.92;AC=1;PctSeqSimilarity=0.9962;PctSizeSimilarity=0.9924;PctRecOverlap=0;SizeDiff=1;StartDistance=-262;EndDistance=-261;TruScore=66;MatchId=511.0 GT:GQ:DR:DV ./.:.:.:. 0/1:6:5:2 ./.:.:.:.
601601
chr18 75526515 pbsv.INS.1225 C CGGAATGGAAGTACATGGCAAGCTCACGGACCATGGGGGGACACAACCAAGTAGAGTCTGGGGAGGTTGGCTGGACGGGGCTGGGAAATGATAGAATTAGCAGAAACACGAT 2 PASS SVTYPE=INS;SVLEN=111;SVANN=TANDEM;AC=1;PctSeqSimilarity=0.9955;PctSizeSimilarity=0.991;PctRecOverlap=0;SizeDiff=-1;StartDistance=170;EndDistance=170;TruScore=66;MatchId=512.0 GT:AD:DP:SAC ./.:.:.:. ./.:.:.:. 0/1:1,3:4:0,1,3,0
602602
chr18 75526686 Sniffles2.INS.16CS11 G GGGAGGTTGGCTGGACGGGGCTGGGAAATGATAGAATTAGCAGAAACACGATGGAATGGAAGTACATGGCAAGCTCACGGACCATGGGGGGACACAACCAAGTAGAGTCT 3 PASS IMPRECISE;SVTYPE=INS;SVLEN=111;SUPPORT=3;COVERAGE=4,4,4,4,4;STRAND=+;AF=0.75;STDEV_LEN=0.577;STDEV_POS=98.15;SUPPORT_LONG=0;AC=1;PctSeqSimilarity=0.9955;PctSizeSimilarity=0.991;PctRecOverlap=0.991;SizeDiff=-1;StartDistance=-1;EndDistance=-1;TruScore=99;MatchId=512.0 GT:GQ:DR:DV ./.:.:.:. 0/1:1:1:3 ./.:.:.:.
603603
chr18 75712365 pbsv.DEL.1228 CAAAATGGCAGCTGCATGGCTGACTCTCAGATCCAAAATGGCTGCTGCATGGCCGACTCTCTCAGATCC C 2 PASS SVTYPE=DEL;SVLEN=-68;SVANN=TANDEM;AC=1;PctSeqSimilarity=0.9855;PctSizeSimilarity=1;PctRecOverlap=0.7971;SizeDiff=0;StartDistance=-14;EndDistance=-14;TruScore=92;MatchId=513.0 GT:AD:DP:SAC ./.:.:.:. ./.:.:.:. 0/1:4,2:6:1,3,2,0

0 commit comments

Comments
 (0)