Skip to content

Commit 11c0404

Browse files
authored
Update dataset-open-targets.md
1 parent b2dc6ce commit 11c0404

File tree

1 file changed

+10
-26
lines changed

1 file changed

+10
-26
lines changed

articles/open-datasets/dataset-open-targets.md

Lines changed: 10 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -8,54 +8,38 @@ ms.date: 04/16/2021
88

99
# 1000 Genomes
1010

11-
The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalog of human variation and genotype data. The final data set contains data for 2,504 individuals from 26 populations and 84 million identified variants. For more information, see the 1000 Genome Project website and the following publications:
12-
13-
Pilot Analysis: A map of human genome variation from population-scale sequencing Nature 467, 1061-1073 (28 October 2010)
14-
15-
Phase 1 Analysis: An integrated map of genetic variation from 1,092 human genomes Nature 491, 56-65 (01 November 2012)
16-
17-
Phase 3 Analysis: A global reference for human genetic variation Nature 526, 68-74 (01 October 2015) and An integrated map of structural variation in 2,504 human genomes Nature 526, 75-81 (01 October 2015)
18-
19-
For details on data formats refer to http://www.internationalgenome.org/formats
20-
21-
**[NEW]** the dataset is also available in [parquet format](https://github.com/microsoft/genomicsnotebook/tree/main/vcf2parquet-conversion/1000genomes)
11+
The Open Targets Platform has built data resource that supports systematic identification and prioritisation of potential therapeutic drug targets.By integrating publicly available datasets including data generated by the Open Targets consortium, the Platform has built and scored target-disease associations to assist in drug target identification and prioritisation. It also integrates relevant annotation information about targets, diseases, phenotypes, and drugs, as well as their most relevant relationships.
2212

13+
The Open Targets Genetics highlights variant-centric statistical evidence to allow both prioritisation of candidate causal variants at trait-associated loci and identification of potential drug targets.It aggregates and integrates genetic associations curated from both literature and newly-derived loci from UK Biobank and FinnGen and also contains functional genomics data (e.g. chromatin conformation, chromatin interactions) and quantitative trait loci (eQTLs, pQTLs and sQTLs). Large-scale pipelines apply statistical fine-mapping across thousands of trait-associated loci to resolve association signals and link each variant to its proximal and distal target gene(s) using a Locus2Gene assessment. Integrated cross-trait colocalisation analyses and linking to detailed pharmaceutical compounds extend the capacity of Open Targets Genetics to explore drug repositioning opportunities and shared genetic architecture
14+
To read further about Open Targets Platform visit - [Open Targets Platform](https://platform.opentargets.org)
15+
To read further about Open Targets Genetics visit - [Open Targets Genetics](https://genetics.opentargets.org)
2316
[!INCLUDE [Open Dataset usage notice](../../includes/open-datasets-usage-note.md)]
2417

2518
## Data source
2619

27-
This dataset is a mirror of ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/
20+
This dataset is a mirror of http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/latest and http://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/latest/
2821

2922
## Data volumes and update frequency
3023

31-
This dataset contains approximately 815 TB of data and is updated daily.
24+
This dataset contains approximately 350 GB of data and is updated daily.
3225

3326
## Storage location
3427

35-
This dataset is stored in the West US 2 and West Central US Azure regions. Allocating compute resources in West US 2 or West Central US is recommended for affinity.
28+
This dataset is stored in the West US 2 Azure region. Allocating compute resources in West US 2 is recommended for affinity.
3629

3730
## Data Access
3831

39-
West US 2: 'https://dataset1000genomes.blob.core.windows.net/dataset'
40-
41-
West Central US: 'https://dataset1000genomes-secondary.blob.core.windows.net/dataset'
32+
West US 2: 'https://datasetopentargets.blob.core.windows.net/dataset'
4233

4334
[SAS Token](../storage/common/storage-sas-overview.md): sv=2019-10-10&si=prod&sr=c&sig=9nzcxaQn0NprMPlSh4RhFQHcXedLQIcFgbERiooHEqM%3D
4435

45-
## Data Access: Curated 1000 genomes dataset in parquet format
46-
47-
East US: `https://curated1000genomes.blob.core.windows.net/dataset`
48-
49-
SAS Token: sv=2018-03-28&si=prod&sr=c&sig=BgIomQanB355O4FhxqBL9xUgKzwpcVlRZdBewO5%2FM4E%3D
5036

5137
## Use Terms
5238

53-
Following the final publications, data from the 1000 Genomes Project is publicly available without embargo to anyone for use under the terms provided by the dataset source ([http://www.internationalgenome.org/data](http://www.internationalgenome.org/data)). Use of the data should be cited per details available in the [FAQs]() from the 1000 Genome Project.
54-
39+
Please refer to the data use terms as described [here](https://platform-docs.opentargets.org/licence)
5540
## Contact
5641

57-
https://www.internationalgenome.org/contact
42+
[https://www.internationalgenome.org/contact](https://community.opentargets.org)
5843

59-
## Next steps
6044

6145
View the rest of the datasets in the [Open Datasets catalog](dataset-catalog.md).

0 commit comments

Comments
 (0)