Skip to content

Commit d173a82

Browse files
authored
Update dataset-illumina-platinum-genomes.md
1 parent 5e10f50 commit d173a82

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

articles/open-datasets/dataset-illumina-platinum-genomes.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ West US 2: 'https://datasetplatinumgenomes.blob.core.windows.net/dataset'
3434

3535
West Central US: 'https://datasetplatinumgenomes-secondary.blob.core.windows.net/dataset'
3636

37+
[SAS Token](/azure/storage/common/storage-sas-overview): sv=2019-02-02&se=2050-01-01T08%3A00%3A00Z&si=prod&sr=c&sig=FFfZ0QaDcnEPQmWsshtpoYOjbzd4jtwIWeK%2Fc4i9MqM%3D
38+
3739
## Use Terms
3840

3941
Data is available without restrictions. For more information and citation details, see the [official Illumina site](https://www.illumina.com/platinumgenomes.html).
@@ -53,7 +55,7 @@ For any questions or feedback about the dataset, contact platinumgenomes@illumin
5355

5456
## Getting the Illumina Platinum Genomes from Azure Open Datasets and Doing Initial Analysis
5557

56-
Use Jupyter notebooks, GATK, and Picard in analyses such as:
58+
Use Jupyter notebooks, GATK, and Picard to complete the following tasks:
5759

5860
1. Annotate genotypes using VariantFiltration
5961
2. Select Specific Variants
@@ -75,7 +77,7 @@ This notebook requires the following libraries:
7577

7678
## Getting the Genomics data from Azure Open Datasets
7779

78-
Several public genomics data has been uploaded as an Azure Open Dataset [here](https://azure.microsoft.com/services/open-datasets/catalog/). We create a blob service linked to this open dataset. You can find examples of data calling procedure from Azure Open Dataset for `Illumina Platinum Genomes` datasets as:
80+
Several public genomics data are available as an Azure Open Dataset [here](https://azure.microsoft.com/services/open-datasets/catalog/). We create a blob service linked to this open dataset. You can find examples of data calling procedure from Azure Open Dataset for `Illumina Platinum Genomes` datasets as follows:
7981

8082
### Downloading the specific 'Illumina Platinum Genomes'
8183

@@ -108,7 +110,7 @@ There are many different options for selecting subsets of variants from a larger
108110
Extract one or more samples from a call set based on either a complete sample name or a pattern match.
109111
Specify criteria for inclusion that place thresholds on annotation values, **for example "DP > 1000" (depth of coverage greater than 1000x), "AF < 0.25" (sites with allele frequency less than 0.25)**. These criteria are written as "JEXL expressions", which are documented in the article about using JEXL expressions.
110112
Provide concordance or discordance tracks in order to include or exclude variants that are also present in other given call sets.
111-
Select variants based on criteria like their type (for example, INDELs only), evidence of mendelian violation, filtering status, allelicity, etc.
113+
Select variants based on criteria like their type (for example, INDELs only), evidence of Mendelian violation, filtering status, allelicity, etc.
112114
There are also several options for recording the original values of certain annotations, which are recalculated when one subsets the new call set, trims alleles, etc.
113115

114116
Input: A variant call set in VCF format from which a subset can be selected.
@@ -123,7 +125,7 @@ run gatk SelectVariants -R Homo_sapiens_assembly38.fasta -V outputannot.vcf --se
123125

124126
Running SelectVariants with --set-filtered-gt-to-nocall will further transform the flagged genotypes with a null genotype call.
125127

126-
This conversion is necessary because downstream tools do not parse the FORMAT-level filter field.
128+
This conversion is necessary because downstream tools don't parse the FORMAT-level filter field.
127129

128130
How can we filter the variants with **'No call'**
129131

@@ -162,7 +164,7 @@ Extract fields from a VCF file to a tab-delimited table. This tool extracts spec
162164

163165
INFO/site-level fields:
164166

165-
Use the `-F` argument to extract INFO fields; each field occupies a single column in the output file. The field can be any standard VCF column (for example, CHROM, ID, QUAL) or any annotation name in the INFO field (for example, AC, AF). The tool also supports the following fields:
167+
Use the `-F` argument to extract INFO fields; each field will occupy a single column in the output file. The field can be any standard VCF column (for example, CHROM, ID, QUAL) or any annotation name in the INFO field (for example, AC, AF). The tool also supports the following fields:
166168

167169
EVENTLENGTH (length of the event)
168170
TRANSITION (1 for a bi-allelic transition (SNP), 0 for bi-allelic transversion (SNP), -1 for INDELs and multi-allelics)

0 commit comments

Comments
 (0)