You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/open-datasets/dataset-open-targets.md
+10-26Lines changed: 10 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,54 +8,38 @@ ms.date: 04/16/2021
8
8
9
9
# 1000 Genomes
10
10
11
-
The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalog of human variation and genotype data. The final data set contains data for 2,504 individuals from 26 populations and 84 million identified variants. For more information, see the 1000 Genome Project website and the following publications:
12
-
13
-
Pilot Analysis: A map of human genome variation from population-scale sequencing Nature 467, 1061-1073 (28 October 2010)
14
-
15
-
Phase 1 Analysis: An integrated map of genetic variation from 1,092 human genomes Nature 491, 56-65 (01 November 2012)
16
-
17
-
Phase 3 Analysis: A global reference for human genetic variation Nature 526, 68-74 (01 October 2015) and An integrated map of structural variation in 2,504 human genomes Nature 526, 75-81 (01 October 2015)
18
-
19
-
For details on data formats refer to http://www.internationalgenome.org/formats
20
-
21
-
**[NEW]** the dataset is also available in [parquet format](https://github.com/microsoft/genomicsnotebook/tree/main/vcf2parquet-conversion/1000genomes)
11
+
The Open Targets Platform has built data resource that supports systematic identification and prioritisation of potential therapeutic drug targets.By integrating publicly available datasets including data generated by the Open Targets consortium, the Platform has built and scored target-disease associations to assist in drug target identification and prioritisation. It also integrates relevant annotation information about targets, diseases, phenotypes, and drugs, as well as their most relevant relationships.
22
12
13
+
The Open Targets Genetics highlights variant-centric statistical evidence to allow both prioritisation of candidate causal variants at trait-associated loci and identification of potential drug targets.It aggregates and integrates genetic associations curated from both literature and newly-derived loci from UK Biobank and FinnGen and also contains functional genomics data (e.g. chromatin conformation, chromatin interactions) and quantitative trait loci (eQTLs, pQTLs and sQTLs). Large-scale pipelines apply statistical fine-mapping across thousands of trait-associated loci to resolve association signals and link each variant to its proximal and distal target gene(s) using a Locus2Gene assessment. Integrated cross-trait colocalisation analyses and linking to detailed pharmaceutical compounds extend the capacity of Open Targets Genetics to explore drug repositioning opportunities and shared genetic architecture
14
+
To read further about Open Targets Platform visit - [Open Targets Platform](https://platform.opentargets.org)
15
+
To read further about Open Targets Genetics visit - [Open Targets Genetics](https://genetics.opentargets.org)
This dataset is a mirror of ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/
20
+
This dataset is a mirror of http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/latest and http://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/latest/
28
21
29
22
## Data volumes and update frequency
30
23
31
-
This dataset contains approximately 815 TB of data and is updated daily.
24
+
This dataset contains approximately 350 GB of data and is updated daily.
32
25
33
26
## Storage location
34
27
35
-
This dataset is stored in the West US 2 and West Central US Azure regions. Allocating compute resources in West US 2 or West Central US is recommended for affinity.
28
+
This dataset is stored in the West US 2 Azure region. Allocating compute resources in West US 2 is recommended for affinity.
36
29
37
30
## Data Access
38
31
39
-
West US 2: 'https://dataset1000genomes.blob.core.windows.net/dataset'
40
-
41
-
West Central US: 'https://dataset1000genomes-secondary.blob.core.windows.net/dataset'
32
+
West US 2: 'https://datasetopentargets.blob.core.windows.net/dataset'
## Data Access: Curated 1000 genomes dataset in parquet format
46
-
47
-
East US: `https://curated1000genomes.blob.core.windows.net/dataset`
48
-
49
-
SAS Token: sv=2018-03-28&si=prod&sr=c&sig=BgIomQanB355O4FhxqBL9xUgKzwpcVlRZdBewO5%2FM4E%3D
50
36
51
37
## Use Terms
52
38
53
-
Following the final publications, data from the 1000 Genomes Project is publicly available without embargo to anyone for use under the terms provided by the dataset source ([http://www.internationalgenome.org/data](http://www.internationalgenome.org/data)). Use of the data should be cited per details available in the [FAQs]() from the 1000 Genome Project.
54
-
39
+
Please refer to the data use terms as described [here](https://platform-docs.opentargets.org/licence)
0 commit comments