-
Notifications
You must be signed in to change notification settings - Fork 10
FAQ
Joe Zhu edited this page May 13, 2017
·
7 revisions
Data filtering is an important step for deconvolution.
utilities/dataExplore.r -vcf data/exampleData/PG0415-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-o PG0415-C
We observe a small number of heterozygous sites with high coverage (marked as crosses above), which can potentially mislead our model to over-fit the data with additional strains.
./dEploid -vcf data/exampleData/PG0415-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-noPanel -o PG0415-CNopanel -seed 2
utilities/interpretDEploid.r -vcf data/exampleData/PG0415-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-dEprefix PG0415-CNopanel \
-o PG0415-CNopanel

The data exploration utility utilities/dataExplore.r identifies a list of potential outliers. After filtering,
./dEploid -vcf data/exampleData/PG0415-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-noPanel -o PG0415-CNopanel.filtered -seed 2 \
-exclude PG0415-CPotentialOutliers.txt
utilities/interpretDEploid.r -vcf data/exampleData/PG0415-C.eg.vcf.gz \
-plaf data/exampleData/labStrains.eg.PLAF.txt \
-dEprefix PG0415-CNopanel.filtered \
-o PG0415-CNopanel.filtered \
-exclude PG0415-CPotentialOutliers.txt