-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Labels
Milestone
Description
Hello,
There is a problem when outputing seg and vcf file and the total CNV called is zero or one.
There is an example of seg file with one CNV:
rhelaers@ddus:~/qdnaseq$ cat CLP-1271-3.seg
x
/home/users/rhelaers/qdnaseq/CLP-1271-3.seg
21
14000001
25000000
21
0.17
And the vcf:
rhelaers@ddus:~/qdnaseq$ cat CLP-1271-3.vcf
##fileformat=VCFv4.2
##source=QDNAseq-1.30.0
##REF=<ID=DIP,Description="CNV call">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##FILTER=<ID=LOWQ,Description="Filtered due to call in low quality region">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of variant: DEL,DUP,INS">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of variant">
##INFO=<ID=BINS,Number=1,Type=Integer,Description="Number of bins in call">
##INFO=<ID=SCORE,Number=1,Type=Integer,Description="Score of calling algorithm">
##INFO=<ID=LOG2CNT,Number=1,Type=Float,Description="Log 2 count">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
x
21
14000001
.
<DIP>
<DUP>
1000
PASS
SVTYPE=DUP;END=25000000;SVLEN=11000000;BINS=21;SCORE=1;LOG2CNT=0.17
GT
0/1
The code I used to get that (I use a unique BAM file of shallow WGS, and the GRCh38 dataset generated by asntech):
library(QDNAseq.hg38)
bins <- getBinAnnotations(binSize=as.numeric(binsize), genome="hg38")
readCounts <- binReadCounts(bins, bamfiles=bam, chunkSize = 10000000)
readCountsFiltered <- applyFilters(readCounts, residual=TRUE, blacklist=TRUE)
readCountsFiltered <- estimateCorrection(readCountsFiltered)
copyNumbers <- correctBins(readCountsFiltered)
copyNumbersNormalized <- normalizeBins(copyNumbers)
copyNumbersSmooth <- smoothOutlierBins(copyNumbersNormalized)
copyNumbersSegmented <- segmentBins(copyNumbersSmooth, transformFun="sqrt")
copyNumbersSegmented <- normalizeSegmentedBins(copyNumbersSegmented)
copyNumbersCalled <- callBins(copyNumbersSegmented)
dir <- paste0(outputdir,"/QDNAseq/")
path <- paste0(dir,sample)
unlink(dir)
dir.create(dir)
setwd(dir)
exportBins(copyNumbersCalled, file=paste0(path,".seg"), format="seg")
exportBins(copyNumbersCalled, file=paste0(path,".vcf"), format="vcf")
My sessionInfo
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biobase_2.54.0 BiocGenerics_0.40.0 QDNAseq.hg38_1.0.0
[4] QDNAseq_1.30.0
loaded via a namespace (and not attached):
[1] parallelly_1.30.0 DNAcopy_1.68.0 XVector_0.34.0
[4] GenomicRanges_1.46.1 zlibbioc_1.40.0 IRanges_2.28.0
[7] BiocParallel_1.28.3 impute_1.68.0 GenomeInfoDb_1.30.0
[10] globals_0.14.0 tools_4.1.0 CGHcall_2.56.0
[13] parallel_4.1.0 R.oo_1.24.0 marray_1.72.0
[16] matrixStats_0.61.0 digest_0.6.29 CGHbase_1.54.0
[19] crayon_1.4.2 GenomeInfoDbData_1.2.7 BiocManager_1.30.16
[22] codetools_0.2-18 R.utils_2.11.0 S4Vectors_0.32.3
[25] bitops_1.0-7 RCurl_1.98-1.5 limma_3.50.0
[28] future.apply_1.8.1 compiler_4.1.0 R.methodsS3_1.8.1
[31] Rsamtools_2.10.0 Biostrings_2.62.0 stats4_4.1.0
[34] future_1.23.0 listenv_0.8.0
And I think I found the problem in the code :-)
In exportBins.R, here in exportSEG <- function(obj, fnames=NULL) {...}
out <- cbind(fnames[i], chr, pos, end, bins, segVal)
colnames(out) <- c("SAMPLE_NAME", "CHROMOSOME", "START", "STOP", "DATAPOINTS", "LOG2_RATIO_MEAN")
## Drop copy-neutral segments
out <- out[dsel[posI, 4] != 0, ]
write.table(out, file = fnames[i], quote=FALSE, sep="\t", append=FALSE, col.names=TRUE, row.names=FALSE)
The problem is that out <- out[dsel[posI, 4] != 0, ] generates a vector instead of a dataframe if the result has a size of 1 or 0. Then write.table outputs the vector as lines and not as columns.
If I make some check on the size of out, then something like out <- data.frame(matrix(out[dsel[posI, 4] != 0, ], ncol = 6, nrow = 1)), it works.