1414## Updates (pre-release v0.0.6)
1515
1616* Fix corrupted VCF output in v0.0.5
17- * Low memory usage (especially when mosaic variant calling enabled)
18- <!-- * Add [longdust](https://github.com/lh3/longdust) for long low-complexity regions -->
17+ * Fix missing MEI header in VCF output
18+ * Improved run time and memory usage (especially when mosaic variant calling enabled)
19+ * Add ` --input-is-list ` and ` -X ` to support multiple input BAM/CRAM files of the same sample for variant calling
1920
2021
2122## Getting Started
@@ -48,10 +49,12 @@ man ./longcallD.1
4849 - [ Build from source] ( #build-from-source )
4950- [ Usage] ( #usage )
5051 - [ Variant calling with PacBio HiFi/Nanopore long reads] ( #variant-calling-with-pacbio-hifinanopore-long-reads )
52+ - [ Variant calling with multiple input BAM/CRAM files of the same sample] ( #variant-calling-with-multiple-input-bamcram-files-of-the-same-sample )
5153 - [ Low allele-frequency mosaic variant calling] ( #low-allele-frequency-mosaic-variant-calling )
5254 - [ Region-specific variant calling] ( #region-specific-variant-calling )
53- - [ Variant calling and output phased long reads ] ( #variant-calling-and-output-phased-long-reads )
55+ - [ Variant calling and output phased ( \& refined) long-read BAM/CRAM ] ( #variant-calling-and-output-phased--refined- long-read-bamcram )
5456 - [ Variant calling from remote files] ( #variant-calling-from-remote-files )
57+ - [ Memory usage] ( #memory-usage )
5558- [ Acknowledgements] ( #acknowledgements )
5659- [ Contact] ( #contact )
5760
@@ -107,14 +110,27 @@ longcallD call -t16 ref.fa hifi.bam > hifi.vcf # default for PacBio HiFi
107110longcallD call -t16 ref.fa ont.bam --ont > ont.vcf # for ONT reads
108111```
109112
113+ ### Variant calling with multiple input BAM/CRAM files of the same sample
114+ You can provide multiple BAM/CRAM files of the same sample for variant calling using ` --input-is-list ` or ` -X ` :
115+ ```
116+ longcallD call -t16 --input-is-list ref.fa bam_list.txt > sample.vcf
117+ # where bam_list.txt contains:
118+ # sample_part1.bam
119+ # sample_part2.bam
120+ # sample_part3.bam
121+ ```
122+ or
123+ ```
124+ longcallD call -t16 ref.fa sample_part1.bam -X sample_part2.bam -X sample_part3.bam > sample.vcf
125+ ```
126+
110127### Low allele-frequency mosaic variant calling
111- With ` -s ` , longcallD will detect both germline and somatic/mosaic variants.
128+ With ` -s ` , longcallD will detect both germline and low-frequency somatic/mosaic variants.
112129
113130For each somatic/mosaic variant, a ` SOMATIC ` tag will be added to the INFO field in the output VCF.
114131```
115132longcallD call -s -t16 ref.fa hifi.bam > hifi.vcf
116133longcallD call -s -t16 ref.fa hifi.bam -T AluY_L1_SVA_cons_noPA.fa > hifi.vcf # add MEI information in INFO field
117- longcallD call -s -t16 ref.fa ont.bam --ont > ont.vcf
118134```
119135
120136### Region-specific variant calling
@@ -126,10 +142,10 @@ longcallD call -t16 ref.fa hifi.bam --region-file reg.bed > hifi_regs.vcf
126142longcallD call -t16 ref.fa hifi.bam --autosome > hifi_autosome.vcf
127143```
128144
129- ### Variant calling and output phased long reads
145+ ### Variant calling and output phased (& refined) long-read BAM/CRAM
130146```
131- longcallD call -t16 ref.fa hifi.bam --hifi -b hifi_phased.bam > hifi.vcf # output phased HiFi reads (BAM tag: HP & PS)
132- longcallD call -t16 ref.fa ont.bam --ont -b ont_phased .bam > ont.vcf # output phased ONT reads (BAM tag: HP & PS)
147+ longcallD call -t16 ref.fa hifi.bam --hifi -b hifi_phased.bam > hifi.vcf # output phased HiFi reads (BAM tag: HP & PS)
148+ longcallD call -t16 ref.fa ont.bam --ont --refine-aln -b ont_phased_refined .bam > ont.vcf # output phased & refined ONT reads (BAM tag: HP & PS)
133149```
134150### Variant calling from remote files
135151```
@@ -138,6 +154,14 @@ bam=https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/HG002_NA2438
138154longcallD call -t16 $ref $bam chr11:10,229,956-10,256,221 chr12:10,576,356-10,583,438 > hifi_regs.vcf
139155```
140156
157+ ## Memory usage
158+ As longcallD performs multiple-sequence alignment/re-alignment, which are memory-intensive, it usually uses more memory than other variant callers.
159+ The peak memory usage mainly depends on the number of threads (` -t/--threads ` ), the sequencing coverage, and the read length.
160+ For human genome sequencing data with ~ 40x coverage, longcallD typically uses around ** 1GB** (** HiFi** ) or ** 2GB** (** ONT R10** ) memory per thread for germline variant calling.
161+
162+ If you encounter memory issues, you can use ` --region-file ` to limit the genomic regions being processed.
163+ Human genome region list excluding centromeres are provided [ here] ( https://github.com/yangao07/longcallD/blob/main/anno/ ) .
164+
141165## Acknowledgements
142166LongcallD is dependent on the following libraries, we are grateful to all the developers/maintainers:
143167
@@ -146,7 +170,7 @@ LongcallD is dependent on the following libraries, we are grateful to all the de
146170* [ WFA] ( https://github.com/smarco/WFA2-lib ) : pairwise alignment
147171* [ edlib] ( https://github.com/Martinsos/edlib ) : fast sequence similarity calculation
148172* [ cgranges] ( https://github.com/lh3/cgranges ) : interval operations
149- * [ sdust] ( https://github.com/lh3/sdust ) and [ longdust ] ( https://github.com/lh3/longdust ) : identify low-complexity regions
173+ * [ sdust] ( https://github.com/lh3/sdust ) : identify low-complexity regions
150174
151175## Contact
152176
0 commit comments