Skip to content

Commit 5632449

Browse files
committed
Freshness update for dataset-1000-genomes.md . . .
1 parent 9001cf1 commit 5632449

File tree

1 file changed

+17
-16
lines changed

1 file changed

+17
-16
lines changed

articles/open-datasets/dataset-1000-genomes.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,58 +3,59 @@ title: 1000 Genomes
33
description: Learn how to use the 1000 Genomes dataset in Azure Open Datasets.
44
ms.service: open-datasets
55
ms.topic: sample
6-
ms.date: 04/16/2021
6+
ms.reviewer: fsolomon
7+
ms.date: 07/10/2024
78
---
89

910
# 1000 Genomes
1011

11-
The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalog of human variation and genotype data. The final data set contains data for 2,504 individuals from 26 populations and 84 million identified variants. For more information, see the 1000 Genome Project website and the following publications:
12+
The 1000 Genomes Project ran between 2008 and 2015, to create the largest public catalog of human variation and genotype data. The final data set contains data for 2,504 individuals from 26 populations and 84 million identified variants. For more information, visit the 1000 Genome Project [website](https://www.internationalgenome.org/) and these publications:
1213

13-
Pilot Analysis: A map of human genome variation from population-scale sequencing Nature 467, 1061-1073 (28 October 2010)
14+
[Pilot Analysis: A map of human genome variation from population-scale sequencing Nature 467, 1061-1073 (28 October 2010)](https://www.nature.com/articles/nature09534)
1415

15-
Phase 1 Analysis: An integrated map of genetic variation from 1,092 human genomes Nature 491, 56-65 (01 November 2012)
16+
[Phase 1 Analysis: An integrated map of genetic variation from 1,092 human genomes Nature 491, 56-65 (01 November 2012)](https://www.nature.com/articles/nature11632)
1617

17-
Phase 3 Analysis: A global reference for human genetic variation Nature 526, 68-74 (01 October 2015) and An integrated map of structural variation in 2,504 human genomes Nature 526, 75-81 (01 October 2015)
18+
[Phase 3 Analysis: A global reference for human genetic variation Nature 526, 68-74 (01 October 2015) and An integrated map of structural variation in 2,504 human genomes Nature 526, 75-81](https://www.nature.com/articles/nature15394)
1819

19-
For details on data formats refer to http://www.internationalgenome.org/formats
20+
Visit [this resource](http://www.internationalgenome.org/formats) for more information about the relevant data formats.
2021

21-
**[NEW]** the dataset is also available in [parquet format](https://github.com/microsoft/genomicsnotebook/tree/main/vcf2parquet-conversion/1000genomes)
22+
**[NEW]**: The dataset is also available in [parquet format](https://github.com/microsoft/genomicsnotebook/tree/main/vcf2parquet-conversion/1000genomes).
2223

2324
[!INCLUDE [Open Dataset usage notice](./includes/open-datasets-usage-note.md)]
2425

2526
## Data source
2627

27-
This dataset is a mirror of ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/
28+
This dataset is a mirror of [this](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/) FTP resource.
2829

2930
## Data volumes and update frequency
3031

31-
This dataset contains approximately 815 TB of data and is updated daily.
32+
This dataset contains approximately 815 TB of data. It receives daily updates.
3233

3334
## Storage location
3435

35-
This dataset is stored in the West US 2 and West Central US Azure regions. Allocating compute resources in West US 2 or West Central US is recommended for affinity.
36+
This dataset is stored in the **West US 2** and **West Central US** Azure regions. Allocation of compute resources in **West US 2** or **West Central US** is recommended for affinity.
3637

3738
## Data Access
3839

39-
West US 2: 'https://dataset1000genomes.blob.core.windows.net/dataset'
40+
West US 2: ['https://dataset1000genomes.blob.core.windows.net/dataset']('https://dataset1000genomes.blob.core.windows.net/dataset')
4041

41-
West Central US: 'https://dataset1000genomes-secondary.blob.core.windows.net/dataset'
42+
West Central US: ['https://dataset1000genomes-secondary.blob.core.windows.net/dataset']('https://dataset1000genomes-secondary.blob.core.windows.net/dataset')
4243

4344
[SAS Token](../storage/common/storage-sas-overview.md): sv=2019-10-10&si=prod&sr=c&sig=9nzcxaQn0NprMPlSh4RhFQHcXedLQIcFgbERiooHEqM%3D
4445

4546
## Data Access: Curated 1000 genomes dataset in parquet format
4647

47-
East US: `https://curated1000genomes.blob.core.windows.net/dataset`
48+
East US: ['https://curated1000genomes.blob.core.windows.net/dataset']('https://curated1000genomes.blob.core.windows.net/dataset')
4849

49-
SAS Token: sv=2018-03-28&si=prod&sr=c&sig=BgIomQanB355O4FhxqBL9xUgKzwpcVlRZdBewO5%2FM4E%3D
50+
SAS Token: **sv=2018-03-28&si=prod&sr=c&sig=BgIomQanB355O4FhxqBL9xUgKzwpcVlRZdBewO5%2FM4E%3D**
5051

5152
## Use Terms
5253

53-
Following the final publications, data from the 1000 Genomes Project is publicly available without embargo to anyone for use under the terms provided by the dataset source ([http://www.internationalgenome.org/data](http://www.internationalgenome.org/data)). Use of the data should be cited per details available in the [FAQs]() from the 1000 Genome Project.
54+
Following the final publications, data from the 1000 Genomes Project is publicly available, without embargo, to anyone for use under the terms provided by the [dataset source](http://www.internationalgenome.org/data). Use of the data should be cited per details available in the 1000 Genome Project [FAQ resource](https://www.internationalgenome.org/faq).
5455

5556
## Contact
5657

57-
https://www.internationalgenome.org/contact
58+
Scroll down at [this resource](https://www.internationalgenome.org/contact) for the contact information.
5859

5960
## Next steps
6061

0 commit comments

Comments
 (0)