Skip to content

Commit b12ca0b

Browse files
author
Jill Grant
authored
Merge pull request #711 from mamtagiri/patch-1
Update with sas token change notice
2 parents d470dc3 + b18afda commit b12ca0b

11 files changed

+31
-1
lines changed

articles/open-datasets/dataset-1000-genomes.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ ms.date: 07/10/2024
99

1010
# 1000 Genomes
1111

12+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
13+
1214
The 1000 Genomes Project ran between 2008 and 2015, to create the largest public catalog of human variation and genotype data. The final data set contains data for 2,504 individuals from 26 populations and 84 million identified variants. For more information, visit the 1000 Genome Project [website](https://www.internationalgenome.org/) and these publications:
1315

1416
[Pilot Analysis: A map of human genome variation from population-scale sequencing Nature 467, 1061-1073 (28 October 2010)](https://www.nature.com/articles/nature09534)

articles/open-datasets/dataset-clinvar-annotations.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ ms.date: 06/13/2024
99

1010
# ClinVar Annotations
1111

12+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
13+
1214
The [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/) resource is a freely accessible, public archive of reports - with supporting evidence - about the relationships among human variations and phenotypes. It facilitates access to and communication about the claimed relationships between human variation and observed health status, and about the history of that interpretation. It provides access to a broader set of clinical interpretations that researchers can incorporate into genomics workflows and applications.
1315

1416
Visit the [Data Dictionary](https://www.ncbi.nlm.nih.gov/projects/clinvar/ClinVarDataDictionary.pdf) and the [FAQ resource](https://www.ncbi.nlm.nih.gov/clinvar/docs/faq/) for more information about the data.

articles/open-datasets/dataset-encode.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ ms.date: 04/16/2021
88

99
# ENCODE: Encyclopedia of DNA Elements
1010

11+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
12+
1113
The [Encyclopedia of DNA Elements (ENCODE) Consortium](https://www.encodeproject.org/help/project-overview/) is an ongoing international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). ENCODE's goal is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
1214

1315
ENCODE investigators employ various assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing a diverse range of RNA sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and immunoprecipitation (IP) of proteins that interact with DNA and RNA, that is, modified histones, transcription factors, chromatin regulators, and RNA-binding proteins, followed by sequencing.

articles/open-datasets/dataset-gatk-resource-bundle.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ ms.date: 04/16/2021
88

99
# GATK Resource Bundle
1010

11+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
12+
1113
The [GATK resource bundle](https://gatk.broadinstitute.org/hc/articles/360035890811-Resource-bundle) is a collection of standard files for working with human resequencing data with the GATK.
1214

1315
[!INCLUDE [Open Dataset usage notice](./includes/open-datasets-usage-note.md)]

articles/open-datasets/dataset-genomics-data-lake.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ The Genomics Data Lake is hosted in the West US 2 and West Central US Azure regi
3030
| [GATK Resource Bundle](dataset-gatk-resource-bundle.md) | GATK Resource bundle |
3131
| [TCGA Open Data](dataset-the-cancer-genome-atlas.md) | TCGA Open Data |
3232
| [Pan UK-Biobank](dataset-panancestry-uk-bio-bank.md) | Pan UK-Biobank |
33+
| [ImmuneCODE database](dataset-immunecode.md) | ImmuneCODE database |
34+
| [Open Targets dataset](dataset-panancestry-uk-bio-bank.md) | Open Targets dataset |
3335

3436
## Next steps
3537

articles/open-datasets/dataset-human-reference-genomes.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ ms.date: 04/16/2021
88

99
# Human Reference Genomes
1010

11+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
12+
1113
This dataset includes two human-genome references assembled by the [Genome Reference Consortium](https://www.ncbi.nlm.nih.gov/grc): Hg19 and Hg38.
1214

1315
For more information on Hg19 (GRCh37) data, see the [GRCh37 report at NCBI](https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/).

articles/open-datasets/dataset-illumina-platinum-genomes.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ ms.date: 04/16/2021
88

99
# Illumina Platinum Genomes
1010

11+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
12+
1113
Whole-genome sequencing is enabling researchers worldwide to characterize the human genome more fully and accurately. This requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes as a benchmark. Illumina has generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree. Illumina has called variants in each genome using a range of currently available algorithms.
1214

1315
For more information on the data, see the official [Illumina site](https://www.illumina.com/platinumgenomes.html).
@@ -206,4 +208,4 @@ run gatk VariantsToTable -V NA12877.vcf.gz -F CHROM -F POS -F TYPE -F AC -F AD -
206208

207209
## Next steps
208210

209-
View the rest of the datasets in the [Open Datasets catalog](dataset-catalog.md).
211+
View the rest of the datasets in the [Open Datasets catalog](dataset-catalog.md).

articles/open-datasets/dataset-open-cravat.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ ms.date: 04/16/2021
88

99
# OpenCravat: Open Custom Ranked Analysis of Variants Toolkit
1010

11+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
12+
1113
OpenCRAVAT is a Python package that performs genomic variant interpretation including variant impact, annotation, and scoring. OpenCRAVAT has a modular architecture with a wide variety of analysis modules and annotation resources that can be selected and installed/run based on the needs of a given study.
1214

1315
For more information on the data, see the [OpenCravat](https://opencravat.org/).

articles/open-datasets/dataset-open-targets.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ ms.date: 04/16/2021
88

99
# Open Targets
1010

11+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
12+
1113
The Open Targets Platform is a data resource to facilitate the systematic identification and prioritization of potential therapeutic drug targets. This resource integrates publicly available datasets, including those datasets that are generated by the Open Targets consortium, to build and score target-disease associations, aiding in the identification and prioritization of drug targets. Additionally, it incorporates pertinent annotation information about targets, diseases, phenotypes, drugs, and their key relationships.
1214

1315
The Open Targets Genetics highlights variant-centric statistical evidence to allow both prioritization of candidate causal variants at trait-associated loci and identification of potential drug targets. It collects and combines genetic associations gathered from published literature as well as newly derived data from sources like UK Biobank and FinnGen. Additionally, it includes functional genomics information such as chromatin conformation and interactions, along with quantitative trait loci (eQTLs, pQTLs, and sQTLs). Large-scale pipelines apply statistical fine-mapping across thousands of trait-associated loci to resolve association signals and link each variant to its proximal and distal target genes using a 'Locus2Gene' assessment. Integrated cross-trait colocalisation analyses and linking to detailed pharmaceutical compounds extend the capacity of Open Targets Genetics to explore drug repositioning opportunities and shared genetic architecture.

articles/open-datasets/dataset-the-cancer-genome-atlas.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ ms.date: 09/22/2022
1212

1313
# TCGA Open Data
1414

15+
[!INCLUDE [Open Dataset access change notice](./includes/open-datasets-change-note.md)]
16+
1517
The Cancer Genome Atlas (TCGA), a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types[[1]](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga). The TCGA cancer data made available publically are two tiers: open or controlled access.
1618

1719
- Open access [available on Azure]: This dataset contains deindentified clinical and biospecimen data or summarized data that doesn't contain any individually identifiable information. The data types included are Gene expression, methylation beta values and protein quantification. DNA level datatype includes gene level copy number and masked copy number segment.

0 commit comments

Comments
 (0)