You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/open-datasets/dataset-illumina-platinum-genomes.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.date: 04/16/2021
8
8
9
9
# Illumina Platinum Genomes
10
10
11
-
Whole-genome sequencing is enabling researchers worldwide to characterize the human genome more fully and accurately. This requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes as a benchmark. Illumina has generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree. Illumina has called variants in each genome using a range of currently available algorithms.
11
+
Whole-genome sequencing is enabling researchers worldwide to characterize the human genome more fully and accurately. This effort requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes as a benchmark. Illumina generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree. Illumina called variants in each genome using a range of currently available algorithms.
12
12
13
13
For more information on the data, see the official [Illumina site](https://www.illumina.com/platinumgenomes.html).
14
14
@@ -51,7 +51,7 @@ For any questions or feedback about the dataset, contact platinumgenomes@illumin
51
51
52
52
## Getting the Illumina Platinum Genomes from Azure Open Datasets and Doing Initial Analysis
53
53
54
-
Use Jupyter notebooks, GATK, and Picard to do the following:
54
+
Use Jupyter notebooks, GATK, and Picard in analyses such as:
55
55
56
56
1. Annotate genotypes using VariantFiltration
57
57
2. Select Specific Variants
@@ -73,7 +73,7 @@ This notebook requires the following libraries:
73
73
74
74
## Getting the Genomics data from Azure Open Datasets
75
75
76
-
Several public genomics data has been uploaded as an Azure Open Dataset [here](https://azure.microsoft.com/services/open-datasets/catalog/). We create a blob service linked to this open dataset. You can find examples of data calling procedure from Azure Open Dataset for `Illumina Platinum Genomes` datasets in below:
76
+
Several public genomics data has been uploaded as an Azure Open Dataset [here](https://azure.microsoft.com/services/open-datasets/catalog/). We create a blob service linked to this open dataset. You can find examples of data calling procedure from Azure Open Dataset for `Illumina Platinum Genomes` datasets as:
77
77
78
78
### Downloading the specific 'Illumina Platinum Genomes'
79
79
@@ -160,7 +160,7 @@ Extract fields from a VCF file to a tab-delimited table. This tool extracts spec
160
160
161
161
INFO/site-level fields:
162
162
163
-
Use the `-F` argument to extract INFO fields; each field will occupy a single column in the output file. The field can be any standard VCF column (for example, CHROM, ID, QUAL) or any annotation name in the INFO field (for example, AC, AF). The tool also supports the following fields:
163
+
Use the `-F` argument to extract INFO fields; each field occupies a single column in the output file. The field can be any standard VCF column (for example, CHROM, ID, QUAL) or any annotation name in the INFO field (for example, AC, AF). The tool also supports the following fields:
164
164
165
165
EVENTLENGTH (length of the event)
166
166
TRANSITION (1 for a bi-allelic transition (SNP), 0 for bi-allelic transversion (SNP), -1 for INDELs and multi-allelics)
0 commit comments