Skip to content

Commit 4fabd66

Browse files
committed
Merge branch 'release/v4.2.2'
2 parents cf69e9a + eb47585 commit 4fabd66

File tree

173 files changed

+9294
-6504
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

173 files changed

+9294
-6504
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
[![pylint](imgs/pylint.svg)](https://github.com/acenglish/truvari/actions/workflows/pylint.yml)
33
[![FuncTests](https://github.com/acenglish/truvari/actions/workflows/func_tests.yml/badge.svg?branch=develop&event=push)](https://github.com/acenglish/truvari/actions/workflows/func_tests.yml)
44
[![coverage](imgs/coverage.svg)](https://github.com/acenglish/truvari/actions/workflows/func_tests.yml)
5-
[![develop](https://img.shields.io/github/commits-since/acenglish/truvari/v4.2.0)](https://github.com/ACEnglish/truvari/compare/v4.2.0...develop)
5+
[![develop](https://img.shields.io/github/commits-since/acenglish/truvari/v4.2.1)](https://github.com/ACEnglish/truvari/compare/v4.2.1...develop)
66
[![Downloads](https://static.pepy.tech/badge/truvari)](https://pepy.tech/project/truvari)
77

88
![Logo](https://raw.githubusercontent.com/ACEnglish/truvari/develop/imgs/BoxScale1_DarkBG.png)

docs/v4.2.2/Citations.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Citing Truvari
2+
3+
English, A.C., Menon, V.K., Gibbs, R.A. et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 23, 271 (2022). https://doi.org/10.1186/s13059-022-02840-6
4+
5+
# Citations
6+
7+
List of publications using Truvari. Most of these are just pulled from a [Google Scholar Search](https://scholar.google.com/scholar?q=truvari). Please post in the [show-and-tell](https://github.com/spiralgenetics/truvari/discussions/categories/show-and-tell) to have your publication added to the list.
8+
* [A robust benchmark for detection of germline large deletions and insertions](https://www.nature.com/articles/s41587-020-0538-8)
9+
* [Leveraging a WGS compression and indexing format with dynamic graph references to call structural variants](https://www.biorxiv.org/content/10.1101/2020.04.24.060202v1.abstract)
10+
* [Duphold: scalable, depth-based annotation and curation of high-confidence structural variant calls](https://academic.oup.com/gigascience/article/8/4/giz040/5477467?login=true)
11+
* [Parliament2: Accurate structural variant calling at scale](https://academic.oup.com/gigascience/article/9/12/giaa145/6042728)
12+
* [Learning What a Good Structural Variant Looks Like](https://www.biorxiv.org/content/10.1101/2020.05.22.111260v1.full)
13+
* [Long-read trio sequencing of individuals with unsolved intellectual disability](https://www.nature.com/articles/s41431-020-00770-0)
14+
* [lra: A long read aligner for sequences and contigs](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009078)
15+
* [Samplot: a platform for structural variant visual validation and automated filtering](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02380-5)
16+
* [AsmMix: A pipeline for high quality diploid de novo assembly](https://www.biorxiv.org/content/10.1101/2021.01.15.426893v1.abstract)
17+
* [Accurate chromosome-scale haplotype-resolved assembly of human genomes](https://www.nature.com/articles/s41587-020-0711-0)
18+
* [Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome](https://www.nature.com/articles/s41587-019-0217-9)
19+
* [NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data](https://academic.oup.com/bioinformatics/article-abstract/37/11/1497/5466452)
20+
* [SVIM-asm: structural variant detection from haploid and diploid genome assemblies](https://academic.oup.com/bioinformatics/article/36/22-23/5519/6042701?login=true)
21+
* [Readfish enables targeted nanopore sequencing of gigabase-sized genomes](https://www.nature.com/articles/s41587-020-00746-x)
22+
* [stLFRsv: A Germline Structural Variant Analysis Pipeline Using Co-barcoded Reads](https://internal-journal.frontiersin.org/articles/10.3389/fgene.2021.636239/full)
23+
* [Long-read-based human genomic structural variation detection with cuteSV](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02107-y)
24+
* [An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates](https://f1000research.com/articles/10-246)
25+
* [Paragraph: a graph-based structural variant genotyper for short-read sequence data](https://link.springer.com/article/10.1186/s13059-019-1909-7)
26+
* [Genome-wide investigation identifies a rare copy-number variant burden associated with human spina bifida](https://www.nature.com/articles/s41436-021-01126-9)
27+
* [TT-Mars: Structural Variants Assessment Based on Haplotype-resolved Assemblies](https://www.biorxiv.org/content/10.1101/2021.09.27.462044v1.abstract)
28+
* [An ensemble deep learning framework to refine large deletions in linked-reads](https://www.biorxiv.org/content/10.1101/2021.09.27.462057v1.abstract)
29+
* [MAMnet: detecting and genotyping deletions and insertions based on long reads and a deep learning approach](https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbac195/6587170)](https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbac195/6587170)
30+
* [Automated filtering of genome-wide large deletions through an ensemble deep learning framework](https://www.sciencedirect.com/science/article/pii/S1046202322001712#b0110)

docs/v4.2.2/Development.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Truvari API
2+
Many of the helper methods/objects are documented such that developers can reuse truvari in their own code. To see developer documentation, visit [readthedocs](https://truvari.readthedocs.io/en/latest/).
3+
4+
Documentation can also be seen using
5+
```python
6+
import truvari
7+
help(truvari)
8+
```
9+
10+
# docker
11+
12+
A Dockerfile exists to build an image of Truvari. To make a Docker image, clone the repository and run
13+
```bash
14+
docker build -t truvari .
15+
```
16+
17+
You can then run Truvari through docker using
18+
```bash
19+
docker run -v `pwd`:/data -it truvari
20+
```
21+
Where `pwd` can be whatever directory you'd like to mount in the docker to the path `/data/`, which is the working directory for the Truvari run. You can provide parameters directly to the entry point.
22+
```bash
23+
docker run -v `pwd`:/data -it truvari anno svinfo -i example.vcf.gz
24+
```
25+
26+
If you'd like to interact within the docker container for things like running the CI/CD scripts
27+
```bash
28+
docker run -v `pwd`:/data --entrypoint /bin/bash -it truvari
29+
```
30+
You'll now be inside the container and can run FuncTests or run Truvari directly
31+
```bash
32+
bash repo_utils/truvari_ssshtests.sh
33+
truvari anno svinfo -i example.vcf.gz
34+
```
35+
36+
# CI/CD
37+
38+
Scripts that help ensure the tool's quality. Extra dependencies need to be installed in order to run Truvari's CI/CD scripts.
39+
40+
```bash
41+
pip install pylint anybadge coverage
42+
```
43+
44+
Check code formatting with
45+
```bash
46+
python repo_utils/pylint_maker.py
47+
```
48+
We use [autopep8](https://pypi.org/project/autopep8/) (via [vim-autopep8](https://github.com/tell-k/vim-autopep8)) for formatting.
49+
50+
Test the code and generate a coverage report with
51+
```bash
52+
bash repo_utils/truvari_ssshtests.sh
53+
```
54+
55+
Truvari leverages github actions to perform these checks when new code is pushed to the repository. We've noticed that the actions sometimes hangs through no fault of the code. If this happens, cancel and resubmit the job. Once FuncTests are successful, it uploads an artifact of the `coverage html` report which you can download to see a line-by-line accounting of test coverage.
56+
57+
# git flow
58+
59+
To organize the commits for the repository, we use [git-flow](https://danielkummer.github.io/git-flow-cheatsheet/). Therefore, `develop` is the default branch, the latest tagged release is on `master`, and new, in-development features are within `feature/<name>`
60+
61+
When contributing to the code, be sure you're working off of develop and have run `git flow init`.
62+
63+
# versioning
64+
65+
Truvari uses [Semantic Versioning](https://semver.org/) and tries to stay compliant to [PEP440](https://peps.python.org/pep-0440/). As of v3.0.0, a single version is kept in the code under `truvari/__init__.__version__`. We try to keep the suffix `-dev` on the version in the develop branch. When cutting a new release, we may replace the suffix with `-rc` if we've built a release candidate that may need more testing/development. Once we've committed to a full release that will be pushed to PyPi, no suffix is placed on the version. If you install Truvari from the develop branch, the git repo hash is appended to the installed version as well as '.uc' if there are un-staged commits in the repo.
66+
67+
# docs
68+
69+
The github wiki serves the documentation most relevant to the `develop/` branch. When cutting a new release, we freeze and version the wiki's documentation with the helper utility `docs/freeze_wiki.sh`.
70+
71+
# Creating a release
72+
Follow these steps to create a release
73+
74+
0) Bump release version
75+
1) Run tests locally
76+
2) Update API Docs
77+
3) Change Updates Wiki
78+
4) Freeze the Wiki
79+
5) Ensure all code is checked in
80+
6) Do a [git-flow release](https://danielkummer.github.io/git-flow-cheatsheet/)
81+
7) Use github action to make a testpypi release
82+
8) Check test release
83+
```bash
84+
python3 -m venv test_truvari
85+
python3 -m pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple/ truvari
86+
```
87+
9) Use GitHub action to make a pypi release
88+
10) Download release-tarball.zip from step #9’s action
89+
11) Create release (include #9) from the tag
90+
12) Checkout develop and Bump to dev version and README ‘commits since’ badge

docs/v4.2.2/Home.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
The wiki holds documentation most relevant for develop. For information on a specific version of Truvari, see [`docs/`](https://github.com/spiralgenetics/truvari/tree/develop/docs)
2+
3+
Citation:
4+
English, A.C., Menon, V.K., Gibbs, R.A. et al. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol 23, 271 (2022). https://doi.org/10.1186/s13059-022-02840-6
5+
6+
# Before you start
7+
VCFs aren't always created with a strong adherence to the format's specification.
8+
9+
Truvari expects input VCFs to be valid so that it will only output valid VCFs.
10+
11+
We've developed a separate tool that runs multiple validation programs and standard VCF parsing libraries in order to validate a VCF.
12+
13+
Run [this program](https://github.com/acenglish/usable_vcf) over any VCFs that are giving Truvari trouble.
14+
15+
Furthermore, Truvari expects 'resolved' SVs (e.g. DEL/INS) and will not interpret BND signals across SVTYPEs (e.g. combining two BND lines to match a DEL call). A brief description of Truvari bench methodology is linked below.
16+
17+
Finally, Truvari does not handle multi-allelic VCF entries and as of v4.0 will throw an error if multi-allelics are encountered. Please use `bcftools norm` to split multi-allelic entries.
18+
19+
# Index
20+
21+
- [[Updates|Updates]]
22+
- [[Installation|Installation]]
23+
- Truvari Commands:
24+
- [[anno|anno]]
25+
- [[bench|bench]]
26+
- [[collapse|collapse]]
27+
- [[consistency|consistency]]
28+
- [[divide|divide]]
29+
- [[phab|phab]]
30+
- [[refine|refine]]
31+
- [[segment|segment]]
32+
- [[stratify|stratify]]
33+
- [[vcf2df|vcf2df]]
34+
- [[Development|Development]]
35+
- [[Citations|Citations]]

docs/v4.2.2/Installation.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
Recommended
2+
===========
3+
For stable versions of Truvari, use pip
4+
```
5+
python3 -m pip install truvari
6+
```
7+
Specific versions can be installed via
8+
```
9+
python3 -m pip install truvari==3.2.0
10+
```
11+
See [pypi](https://pypi.org/project/Truvari/#history) for a history of all distributed releases.
12+
13+
Manual Installation
14+
===================
15+
To build Truvari directly, clone the repository and switch to a specific tag.
16+
```
17+
git clone https://github.com/ACEnglish/truvari.git
18+
git checkout tags/v3.0.0
19+
python3 -m pip install .
20+
```
21+
22+
To see a list of all available tags, run:
23+
```
24+
git tag -l
25+
```
26+
27+
If you have an older clone of the repository and don't see the version you're looking for in tags, make sure to pull the latest changes:
28+
```
29+
git pull
30+
git fetch --all --tags
31+
```
32+
33+
Mamba / Conda
34+
=============
35+
NOTE!! There is a very old version of Truvari on bioconda that - for unknown reasons - supersedes the newer, supported versions. Users may need to specify to conda which release to build. See [this ticket](https://github.com/ACEnglish/truvari/issues/130#issuecomment-1196607866) for details.
36+
37+
Truvari releases are automatically deployed to bioconda.
38+
Users can follow instructions here (https://mamba.readthedocs.io/en/latest/installation.html) to install mamba. (A faster alternative conda compatible package manager.)
39+
40+
Creating an environment with Truvari and its dependencies.
41+
```
42+
mamba create -c conda-forge -c bioconda -n truvari truvari
43+
```
44+
45+
Alternatively, see the [conda page](https://anaconda.org/bioconda/truvari) for details
46+
```
47+
conda install -c bioconda truvari
48+
```
49+
50+
Building from develop
51+
=====================
52+
The default branch is `develop`, which holds in-development changes. This is for developers or those wishing to try experimental features and is not recommended for production. Development is versioned higher than the most recent stable release with an added suffix (e.g. Current stable release is `3.0.0`, develop holds `3.1.0-dev`). If you'd like to install develop, repeat the steps above but without `git checkout tags/v3.0.0`. See [wiki](https://github.com/spiralgenetics/truvari/wiki/Development#git-flow) for details on how branching is handled.
53+
54+
Docker
55+
======
56+
See [Development](https://github.com/spiralgenetics/truvari/wiki/Development#docker) for details on building a docker container.

docs/v4.2.2/MatchIds.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
MatchIds are used to tie base/comparison calls together in post-processing for debugging or other exploring. MatchIds have a structure of `{chunkid}.{callid}`. The chunkid is unique id per-chunk of calls. All calls sharing chunkid were within `--chunksize` distance and were compared. The callid is unique to a call in a chunk for each VCF. Because `bench` processes two VCFs (the base and comparison VCFs), the `MatchId` has two values: the first is the base variant's MatchId and the second the comparison variant's MatchId.
2+
3+
For `--pick single`, the two MatchIds will be identical in the e.g. tp-base.vcf.gz and tp-comp.vcf.gz. However, for `--pick ac|multi`, it's possible to have cases such as one base variant matching to multiple comparison variants. That would give us MatchIds like:
4+
5+
```
6+
# tp-base.vcf
7+
MatchId=4.0,4.1
8+
9+
# tp-comp.vcf
10+
MatchId=4.0,4.1
11+
MatchId=4.0,4.2
12+
```
13+
14+
This example tells us that the tp-comp variants are both pointing to `4.0` in tp-base. The tp-base variant has a higher match to the tp-comp `4.1` variant.
15+
16+
One easy way to combine matched variants is to use `truvari vcf2df` to convert a benchmarking result to a pandas DataFrame and leverage pandas' merge operation. First, we convert the `truvari bench` result.
17+
18+
```bash
19+
truvari vcf2df --info --bench-dir bench_result/ data.jl
20+
```
21+
22+
Next, we combine rows of matched variants:
23+
```python
24+
import joblib
25+
import pandas as pd
26+
27+
# Load the data
28+
data = joblib.load("data.jl")
29+
30+
# Separate out the variants from the base VCF and add new columns of the base/comp ids
31+
base = data[data['state'].isin(['tpbase', 'fn'])].copy()
32+
base['base_id'] = base['MatchId'].apply(lambda x: x[0])
33+
base['comp_id'] = base['MatchId'].apply(lambda x: x[1])
34+
35+
# Separate out the variants from the comparison VCF and add new columns of the base/comp ids
36+
comp = data[data['state'].isin(['tp', 'fp'])].copy()
37+
comp['base_id'] = comp['MatchId'].apply(lambda x: x[0])
38+
comp['comp_id'] = comp['MatchId'].apply(lambda x: x[1])
39+
40+
# Merge the base/comparison variants
41+
combined = pd.merge(base, comp, left_on='base_id', right_on='comp_id', suffixes=('_base', '_comp'))
42+
43+
# How many comp variants matched to multiple base variants?
44+
counts1 = combined['base_id_comp'].value_counts()
45+
print('multi-matched comp count', (counts1 != 1).sum())
46+
47+
# How many base variants matched to multiple comp variants?
48+
counts2 = combined['comp_id_base'].value_counts()
49+
print('multi-matched base count', (counts2 != 1).sum())
50+
```
51+
52+
The `MatchId` is also used by `truvari collapse`. However there are two differences. First, in the main `collapse` output, the relevant INFO field is named `CollapsedId`. Second, because collapse only has a single input VCF, it is much easier to merge DataFrames. To merge collapse results kept variants with those that were removed, we again need to convert the VCFs to DataFrames:
53+
54+
```bash
55+
truvari vcf2df -i kept.vcf.gz kept.jl
56+
truvari vcf2df -i removed.vcf.gz remov.jl
57+
```
58+
59+
Then we combine them:
60+
```python
61+
import joblib
62+
import pandas as pd
63+
64+
# Load the kept variants and set the index.
65+
kept = joblib.load("kept.jl").set_index('CollapseId')
66+
67+
# Load the removed variants and set the index.
68+
remov = joblib.load("remov.jl")
69+
remov['CollapseId'] = remov['MatchId'].apply(lambda x: x[0])
70+
remov.set_index('CollapseId', inplace=True)
71+
72+
# Join the two sets of variants
73+
result_df = kept.join(remov, how='right', rsuffix='_removed')
74+
```

docs/v4.2.2/Multi-allelic-VCFs.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Truvari only compares the first alternate allele in VCFs. If a VCF contains multi-allelic sites such as:
2+
3+
```
4+
chr2 1948201 . T TACAACACGTACGATCAGTAGAC,TCAACACACAACACGTACGATCAGTAGAC ....
5+
```
6+
7+
Then pre-process the VCFs with bcftools:
8+
9+
```bash
10+
bcftools norm -m-any base_calls.vcf.gz | bgzip > base_calls_split.vcf.gz
11+
```

0 commit comments

Comments
 (0)