You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/open-datasets/dataset-1000-genomes.md
+12-29Lines changed: 12 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,58 +3,41 @@ title: 1000 Genomes
3
3
description: Learn how to use the 1000 Genomes dataset in Azure Open Datasets.
4
4
ms.service: open-datasets
5
5
ms.topic: sample
6
-
ms.date: 04/16/2021
6
+
ms.reviewer: fsolomon
7
+
ms.date: 07/10/2024
7
8
---
8
9
9
10
# 1000 Genomes
10
11
11
-
The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalog of human variation and genotype data. The final data set contains data for 2,504 individuals from 26 populations and 84 million identified variants. For more information, see the 1000 Genome Project website and the following publications:
12
+
The 1000 Genomes Project ran between 2008 and 2015, to create the largest public catalog of human variation and genotype data. The final data set contains data for 2,504 individuals from 26 populations and 84 million identified variants. For more information, visit the 1000 Genome Project [website](https://www.internationalgenome.org/) and these publications:
12
13
13
-
Pilot Analysis: A map of human genome variation from population-scale sequencing Nature 467, 1061-1073 (28 October 2010)
14
+
[Pilot Analysis: A map of human genome variation from population-scale sequencing Nature 467, 1061-1073 (28 October 2010)](https://www.nature.com/articles/nature09534)
14
15
15
-
Phase 1 Analysis: An integrated map of genetic variation from 1,092 human genomes Nature 491, 56-65 (01 November 2012)
16
+
[Phase 1 Analysis: An integrated map of genetic variation from 1,092 human genomes Nature 491, 56-65 (01 November 2012)](https://www.nature.com/articles/nature11632)
16
17
17
-
Phase 3 Analysis: A global reference for human genetic variation Nature 526, 68-74 (01 October 2015) and An integrated map of structural variation in 2,504 human genomes Nature 526, 75-81 (01 October 2015)
18
+
[Phase 3 Analysis: A global reference for human genetic variation Nature 526, 68-74 (01 October 2015) and An integrated map of structural variation in 2,504 human genomes Nature 526, 75-81](https://www.nature.com/articles/nature15394)
18
19
19
-
For details on data formats refer to http://www.internationalgenome.org/formats
20
+
Visit [this resource](http://www.internationalgenome.org/formats) for more information about the relevant data formats.
20
21
21
-
**[NEW]** the dataset is also available in [parquet format](https://github.com/microsoft/genomicsnotebook/tree/main/vcf2parquet-conversion/1000genomes)
22
+
**[NEW]**: The dataset is also available in [parquet format](https://github.com/microsoft/genomicsnotebook/tree/main/vcf2parquet-conversion/1000genomes).
This dataset is a mirror of ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/
28
+
This dataset is a mirror of [this](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/) FTP resource.
28
29
29
30
## Data volumes and update frequency
30
31
31
-
This dataset contains approximately 815 TB of data and is updated daily.
32
-
33
-
## Storage location
34
-
35
-
This dataset is stored in the West US 2 and West Central US Azure regions. Allocating compute resources in West US 2 or West Central US is recommended for affinity.
36
-
37
-
## Data Access
38
-
39
-
West US 2: 'https://dataset1000genomes.blob.core.windows.net/dataset'
40
-
41
-
West Central US: 'https://dataset1000genomes-secondary.blob.core.windows.net/dataset'
## Data Access: Curated 1000 genomes dataset in parquet format
46
-
47
-
East US: `https://curated1000genomes.blob.core.windows.net/dataset`
48
-
49
-
SAS Token: sv=2018-03-28&si=prod&sr=c&sig=BgIomQanB355O4FhxqBL9xUgKzwpcVlRZdBewO5%2FM4E%3D
32
+
This dataset contains approximately 815 TB of data. It receives daily updates.
50
33
51
34
## Use Terms
52
35
53
-
Following the final publications, data from the 1000 Genomes Project is publicly available without embargo to anyone for use under the terms provided by the dataset source ([http://www.internationalgenome.org/data](http://www.internationalgenome.org/data)). Use of the data should be cited per details available in the [FAQs]() from the 1000 Genome Project.
36
+
Following the final publications, data from the 1000 Genomes Project is publicly available, without embargo, to anyone for use under the terms provided by the [dataset source](http://www.internationalgenome.org/data). Use of the data should be cited per details available in the 1000 Genome Project[FAQ resource](https://www.internationalgenome.org/faq).
54
37
55
38
## Contact
56
39
57
-
https://www.internationalgenome.org/contact
40
+
Scroll down at [this resource](https://www.internationalgenome.org/contact) for the contact information.
0 commit comments