Skip to content

exportBins to seg/vcf when one or zero CNV are called #102

@helaersr

Description

@helaersr

Hello,

There is a problem when outputing seg and vcf file and the total CNV called is zero or one.

There is an example of seg file with one CNV:

rhelaers@ddus:~/qdnaseq$ cat CLP-1271-3.seg
x
/home/users/rhelaers/qdnaseq/CLP-1271-3.seg
21
14000001
25000000
21
0.17

And the vcf:

rhelaers@ddus:~/qdnaseq$ cat CLP-1271-3.vcf
##fileformat=VCFv4.2
##source=QDNAseq-1.30.0
##REF=<ID=DIP,Description="CNV call">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##FILTER=<ID=LOWQ,Description="Filtered due to call in low quality region">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of variant: DEL,DUP,INS">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of variant">
##INFO=<ID=BINS,Number=1,Type=Integer,Description="Number of bins in call">
##INFO=<ID=SCORE,Number=1,Type=Integer,Description="Score of calling algorithm">
##INFO=<ID=LOG2CNT,Number=1,Type=Float,Description="Log 2 count">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
x
21
14000001
.
<DIP>
<DUP>
1000
PASS
SVTYPE=DUP;END=25000000;SVLEN=11000000;BINS=21;SCORE=1;LOG2CNT=0.17
GT
0/1

The code I used to get that (I use a unique BAM file of shallow WGS, and the GRCh38 dataset generated by asntech):

library(QDNAseq.hg38)
bins <- getBinAnnotations(binSize=as.numeric(binsize), genome="hg38")
readCounts              <- binReadCounts(bins, bamfiles=bam, chunkSize = 10000000)
readCountsFiltered      <- applyFilters(readCounts, residual=TRUE, blacklist=TRUE)
readCountsFiltered      <- estimateCorrection(readCountsFiltered)
copyNumbers             <- correctBins(readCountsFiltered)
copyNumbersNormalized   <- normalizeBins(copyNumbers)
copyNumbersSmooth       <- smoothOutlierBins(copyNumbersNormalized)
copyNumbersSegmented    <- segmentBins(copyNumbersSmooth, transformFun="sqrt")
copyNumbersSegmented    <- normalizeSegmentedBins(copyNumbersSegmented)
copyNumbersCalled       <- callBins(copyNumbersSegmented)
dir <- paste0(outputdir,"/QDNAseq/")
path <- paste0(dir,sample)
unlink(dir)
dir.create(dir)
setwd(dir)
exportBins(copyNumbersCalled, file=paste0(path,".seg"), format="seg")
exportBins(copyNumbersCalled, file=paste0(path,".vcf"), format="vcf")

My sessionInfo

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

Random number generation:
 RNG:     Mersenne-Twister
 Normal:  Inversion
 Sample:  Rounding

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biobase_2.54.0      BiocGenerics_0.40.0 QDNAseq.hg38_1.0.0
[4] QDNAseq_1.30.0

loaded via a namespace (and not attached):
 [1] parallelly_1.30.0      DNAcopy_1.68.0         XVector_0.34.0
 [4] GenomicRanges_1.46.1   zlibbioc_1.40.0        IRanges_2.28.0
 [7] BiocParallel_1.28.3    impute_1.68.0          GenomeInfoDb_1.30.0
[10] globals_0.14.0         tools_4.1.0            CGHcall_2.56.0
[13] parallel_4.1.0         R.oo_1.24.0            marray_1.72.0
[16] matrixStats_0.61.0     digest_0.6.29          CGHbase_1.54.0
[19] crayon_1.4.2           GenomeInfoDbData_1.2.7 BiocManager_1.30.16
[22] codetools_0.2-18       R.utils_2.11.0         S4Vectors_0.32.3
[25] bitops_1.0-7           RCurl_1.98-1.5         limma_3.50.0
[28] future.apply_1.8.1     compiler_4.1.0         R.methodsS3_1.8.1
[31] Rsamtools_2.10.0       Biostrings_2.62.0      stats4_4.1.0
[34] future_1.23.0          listenv_0.8.0

And I think I found the problem in the code :-)
In exportBins.R, here in exportSEG <- function(obj, fnames=NULL) {...}

        out <- cbind(fnames[i], chr, pos, end, bins, segVal)
        colnames(out) <- c("SAMPLE_NAME", "CHROMOSOME", "START", "STOP", "DATAPOINTS", "LOG2_RATIO_MEAN")

        ## Drop copy-neutral segments
        out <- out[dsel[posI, 4] != 0, ]
        
        write.table(out, file = fnames[i], quote=FALSE, sep="\t", append=FALSE, col.names=TRUE, row.names=FALSE)

The problem is that out <- out[dsel[posI, 4] != 0, ] generates a vector instead of a dataframe if the result has a size of 1 or 0. Then write.table outputs the vector as lines and not as columns.

If I make some check on the size of out, then something like out <- data.frame(matrix(out[dsel[posI, 4] != 0, ], ncol = 6, nrow = 1)), it works.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions