Skip to content

Commit 64f7c78

Browse files
committed
Merge branch 'release/v1.0.1'
2 parents fdb8f62 + 48abb01 commit 64f7c78

File tree

13 files changed

+108
-251
lines changed

13 files changed

+108
-251
lines changed

Cargo.lock

Lines changed: 9 additions & 134 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "kanpig"
3-
version = "1.0.0"
3+
version = "1.0.1"
44
edition = "2021"
55

66
[dependencies]
@@ -20,7 +20,7 @@ page_size = { version = "0.6.0" }
2020
petgraph = { version = "0.6.5" }
2121
pretty_env_logger = { version = "0.5.0" }
2222
rand = "0.8.5"
23-
rust-htslib = { version = "0.49.0" }
23+
rust-htslib = { version = "0.46.0" }
2424
rust-lapper = { version = "1.1.0" }
2525
serde = { version = "1.0", features = ["derive"] }
2626
serde_json = { version = "1.0" }

README.md

Lines changed: 22 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,12 @@ When performing path-finding, this threshold limits the number of paths which ar
8585
speed up runtime but may come at a cost of recall. A higher `maxpaths` is slower and may come at a cost to
8686
specificity.
8787

88+
### `--maxnodes`
89+
If a neighborhood has too many variants, its graph will become large in memory and slow to traverse This parameter
90+
will turn off path-finding in favor of `--one-to-one` haplotype to variant comparison (see Experimental Parameters
91+
below), reducing runtime and memory usage. This may reduce recall in regions with many SVs, but these regions are
92+
problematic anyway.
93+
8894
### `--hapsim`
8995
After performing kmeans clustering on reads to determine the two haplotypes, if the two haplotypes have a size similarity
9096
above `hapsim`, they are consolidated into a homozygous allele.
@@ -121,29 +127,29 @@ Details of `FT`
121127
# 🔌 Compute Resources
122128

123129
Kanpig is highly parallelized and will fully utilize all threads it is given. However, hyperthreading doesn't seem to
124-
help and therefore the number of threads should probably be limited to the number of physical processors available.
130+
help and therefore the number of threads should probably be limited to the number of physical processors available. For
131+
memory, giving kanpig 2GB per-core is usually more than enough.
132+
133+
The actual runtime and memory usage of kanpig run will depend on the read coverage and the number of SVs in the input
134+
VCF. As a example of kanpig's resource usage with 16 cores available, genotyping a 30x long-read bam against a 2,199
135+
sample VCF (4.3 million SVs) took 13 minutes with a maximum memory usage of 12GB. Converting the bam to a plup file took
136+
4 minutes (8GB of memory) and genotyping with this plup file took 3 minutes (12GB memory).
125137

126-
For memory, a general rule is kanpig will need about 20x the size of the compressed `.vcf.gz`. The minimum required
127-
memory is also dependent on the number of threads running as each will need space for its processing. For example,
128-
a 1.6Gb vcf (~5 million SVs) using 16 cores needs at least 32Gb of RAM. That same vcf with 8 or 4 cores needs at least
129-
24Gb and 20Gb of RAM, respectively.
138+
While genotyping against a plup file is usually faster, bam to plup conversion is most useful for:
139+
* genotyping a large VCF or super-high (>50x) coverage bam.
140+
* a sample that will be genotyped multiple times (e.g. N+1 pipelines)
141+
* long-term access to reads (a plup file is up to ~2,000x smaller than a bam)
130142

131143
# 🔬 Experimental Parameter Details
132144

133145
These parameters have a varying effect on the results and are not guaranteed to be stable across releases.
134146

135-
### `--try-exact`
136-
Before performing the path-finding algorithm that applies a haplotype to the variant graph, perform a 1-to-1 comparison
137-
of the haplotype to each node in the variant graph. If a single node matches above `sizesim` and `seqsim`, the
138-
path-finding is skipped and haplotype applied to the node.
139-
140-
This parameter will boost the specificity and speed of kanpig at the cost of recall.
141-
142-
### `--prune`
143-
Similar to `try-exact`, a 1-to-1 comparison is performed before path-finding. If any matches are found, all paths
144-
which do not traverse the matching nodes are pruned from the variant graph.
147+
### `--one-to-one`
148+
Instead of performing the path-finding algorithm that applies a haplotype to the variant graph, perform a 1-to-1
149+
comparison of the haplotype to each node in the variant graph. If a single node matches above `sizesim` and `seqsim`,
150+
the haplotype is applied to it.
145151

146-
This parameter will boost the specificity and speed of kanpig at the cost of recall.
152+
This parameter will boost the specificity, increase speed, and lower memory usage of kanpig at the cost of recall.
147153

148154
### `--maxhom`
149155

src/genotype_main.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,14 @@ fn task_thread(
127127

128128
let (haps, coverage) =
129129
m_reads.find_pileups(&m_graph.chrom, m_graph.start, m_graph.end);
130-
131130
let haps = ploidy.cluster(haps, coverage, &m_args.kd);
132131

132+
// Only need to build the full graph sometimes
133+
let should_build = !haps.is_empty()
134+
&& !m_args.kd.one_to_one
135+
&& (m_graph.node_indices.len() - 2) <= m_args.kd.maxnodes;
136+
m_graph.build(should_build);
137+
133138
let mut paths: Vec<PathScore> = haps
134139
.iter()
135140
.map(|h| m_graph.apply_coverage(h, &m_args.kd))

src/kplib/annotator.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ pub struct GenotypeAnno {
2929
pub filt: FiltFlags,
3030
pub sq: i32,
3131
pub gq: i32,
32-
pub ps: Option<u16>,
32+
pub ps: Option<u32>,
3333
pub dp: i32,
3434
pub ad: IntG,
3535
pub ks: IntG,

0 commit comments

Comments
 (0)