You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data is available without restrictions. For more information and citation details, see the [official Illumina site](https://www.illumina.com/platinumgenomes.html).
@@ -53,7 +55,7 @@ For any questions or feedback about the dataset, contact platinumgenomes@illumin
53
55
54
56
## Getting the Illumina Platinum Genomes from Azure Open Datasets and Doing Initial Analysis
55
57
56
-
Use Jupyter notebooks, GATK, and Picard in analyses such as:
58
+
Use Jupyter notebooks, GATK, and Picard to complete the following tasks:
57
59
58
60
1. Annotate genotypes using VariantFiltration
59
61
2. Select Specific Variants
@@ -75,7 +77,7 @@ This notebook requires the following libraries:
75
77
76
78
## Getting the Genomics data from Azure Open Datasets
77
79
78
-
Several public genomics data has been uploaded as an Azure Open Dataset [here](https://azure.microsoft.com/services/open-datasets/catalog/). We create a blob service linked to this open dataset. You can find examples of data calling procedure from Azure Open Dataset for `Illumina Platinum Genomes` datasets as:
80
+
Several public genomics data are available as an Azure Open Dataset [here](https://azure.microsoft.com/services/open-datasets/catalog/). We create a blob service linked to this open dataset. You can find examples of data calling procedure from Azure Open Dataset for `Illumina Platinum Genomes` datasets as follows:
79
81
80
82
### Downloading the specific 'Illumina Platinum Genomes'
81
83
@@ -108,7 +110,7 @@ There are many different options for selecting subsets of variants from a larger
108
110
Extract one or more samples from a call set based on either a complete sample name or a pattern match.
109
111
Specify criteria for inclusion that place thresholds on annotation values, **for example "DP > 1000" (depth of coverage greater than 1000x), "AF < 0.25" (sites with allele frequency less than 0.25)**. These criteria are written as "JEXL expressions", which are documented in the article about using JEXL expressions.
110
112
Provide concordance or discordance tracks in order to include or exclude variants that are also present in other given call sets.
111
-
Select variants based on criteria like their type (for example, INDELs only), evidence of mendelian violation, filtering status, allelicity, etc.
113
+
Select variants based on criteria like their type (for example, INDELs only), evidence of Mendelian violation, filtering status, allelicity, etc.
112
114
There are also several options for recording the original values of certain annotations, which are recalculated when one subsets the new call set, trims alleles, etc.
113
115
114
116
Input: A variant call set in VCF format from which a subset can be selected.
Running SelectVariants with --set-filtered-gt-to-nocall will further transform the flagged genotypes with a null genotype call.
125
127
126
-
This conversion is necessary because downstream tools do not parse the FORMAT-level filter field.
128
+
This conversion is necessary because downstream tools don't parse the FORMAT-level filter field.
127
129
128
130
How can we filter the variants with **'No call'**
129
131
@@ -162,7 +164,7 @@ Extract fields from a VCF file to a tab-delimited table. This tool extracts spec
162
164
163
165
INFO/site-level fields:
164
166
165
-
Use the `-F` argument to extract INFO fields; each field occupies a single column in the output file. The field can be any standard VCF column (for example, CHROM, ID, QUAL) or any annotation name in the INFO field (for example, AC, AF). The tool also supports the following fields:
167
+
Use the `-F` argument to extract INFO fields; each field will occupy a single column in the output file. The field can be any standard VCF column (for example, CHROM, ID, QUAL) or any annotation name in the INFO field (for example, AC, AF). The tool also supports the following fields:
166
168
167
169
EVENTLENGTH (length of the event)
168
170
TRANSITION (1 for a bi-allelic transition (SNP), 0 for bi-allelic transversion (SNP), -1 for INDELs and multi-allelics)
0 commit comments