Skip to content

Commit 270a639

Browse files
committed
modified r03 markdown
1 parent 6b0ac2d commit 270a639

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+110
-1556
lines changed

script/process_markdown.ipynb

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -555,22 +555,27 @@
555555
},
556556
{
557557
"cell_type": "code",
558-
"execution_count": 30,
558+
"execution_count": null,
559559
"metadata": {},
560560
"outputs": [],
561561
"source": [
562562
"grant_mapper = {}\n",
563563
"for i, row in r03.iterrows():\n",
564564
"\tyml = {}\n",
565+
"\tdescription = ''\n",
565566
"\tfor k,v in row.items():\n",
566567
"\t\tif not v == '':\n",
567-
"\t\t\tyml[k] = v\n",
568+
"\t\t\tif not k == 'description':\n",
569+
"\t\t\t\tyml[k] = v\n",
570+
"\t\t\telse:\n",
571+
"\t\t\t\tdescription = v\n",
568572
"\tfilename = yml['grant_num']\n",
569573
"\tgrant_mapper[i] = filename\n",
570574
"\twith open('../src/pages/r03/%s.md'%filename, 'w') as o:\n",
571575
"\t\to.write('---\\n')\n",
572576
"\t\to.write(yaml.dump(yml))\n",
573-
"\t\to.write('---')"
577+
"\t\to.write('---\\n')\n",
578+
"\t\to.write(description)"
574579
]
575580
},
576581
{

src/pages/r03/R03OD030596.md

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,8 @@
11
---
22
affilliation: UNIVERSITY OF CALIFORNIA AT DAVIS
3-
description: We will work with the iHMP data resource to apply novel tools and data
4-
analysis methodologies to the challenge of disease association between large microbiome
5-
data sets, Inflammatory Bowel Disease, and the onset of diabetes. We will start
6-
with an annotation-free approach using k-mers to preprocess IBD and diabetes cohorts.
7-
We then will apply a novel scaling technology implemented in the sourmash software
8-
to reduce the data set size by a factor of 2000, rendering it tractable to machine
9-
learning approaches. We next will use random forests to determine a subset of predictive
10-
k-mers, and will measure their accuracy on validation data sets not used in the
11-
initial training. Finally, we will annotate the predictive k-mers using all available
12-
genome databases as well as a novel method to infer the metagenomic presence of
13-
accessory genomes of known genomes. Our outcomes will include a catalog of microbial
14-
genomes that correlate with IBD subtype and the onset of diabetes, as well as automated
15-
workflows to apply similar approaches to other data sets.
163
end_date: '2022-04-30T12:00:00-04:00'
174
grant_num: R03OD030596
185
pi: BROWN, C TITUS
196
title: Large-scale annotation-free disease correlation analysis of the iHMP
20-
---
7+
---
8+
We will work with the iHMP data resource to apply novel tools and data analysis methodologies to the challenge of disease association between large microbiome data sets, Inflammatory Bowel Disease, and the onset of diabetes. We will start with an annotation-free approach using k-mers to preprocess IBD and diabetes cohorts. We then will apply a novel scaling technology implemented in the sourmash software to reduce the data set size by a factor of 2000, rendering it tractable to machine learning approaches. We next will use random forests to determine a subset of predictive k-mers, and will measure their accuracy on validation data sets not used in the initial training. Finally, we will annotate the predictive k-mers using all available genome databases as well as a novel method to infer the metagenomic presence of accessory genomes of known genomes. Our outcomes will include a catalog of microbial genomes that correlate with IBD subtype and the onset of diabetes, as well as automated workflows to apply similar approaches to other data sets.

src/pages/r03/R03OD030597.md

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,9 @@
11
---
22
affilliation: BAYLOR COLLEGE OF MEDICINE
3-
description: The Common Fund Knockout Mouse Phenotyping Program (KOMP2) is a valuable
4-
resource for functionally characterizing mammalian genes. We propose to increase
5-
the utility of KOMP2 by curating and annotating genomic information in the dataset
6-
by collecting and curating human clinical data to match human patients to KOMP2
7-
mice with severe phenotypes. The goal of this project is to assess pediatric patient
8-
cohorts with exome sequencing data and no molecular diagnosis for variants of uncertain
9-
significance in genes that correspond to a lethal phenotype in KOMP2 mouse mutant
10-
lines. Mouse lines categorized as cellular lethal, developmental lethal or subviable
11-
are targeted as relevant for early and severe pediatric phenotypes. For this reason,
12-
we will consider four human patient cohorts. The first cohort consists of patients
13-
who died within the first year of life. The second cohort consists of patients admitted
14-
to the pediatric intensive care units (ICUs) within the first 100 days of life.
15-
The third cohort is a recent sample of pediatric patients with trio exome data available.
16-
The fourth cohort is a pediatric cohort with likely Mendelian disease genes of unknown
17-
function. With each cohort we will identify variants of uncertain significance in
18-
human orthologues corresponding to mouse genes classified as cellular lethal, developmental
19-
lethal or sub-viable. Then, we will compare the mouse and human phenotypes using
20-
standardized phenotype terms to prioritize follow up of genes with variants in our
21-
human cohorts and with similar phenotypes in mice and humans.
223
end_date: '2022-08-31T12:00:00-04:00'
234
grant_num: R03OD030597
245
pi: WORLEY, KIM C
256
title: Expanding the List of Human Disease Genes Using the Knockout Mouse Phenotyping
267
Program (KOMP2) Data to Reassess Human Clinical Data
27-
---
8+
---
9+
The Common Fund Knockout Mouse Phenotyping Program (KOMP2) is a valuable resource for functionally characterizing mammalian genes. We propose to increase the utility of KOMP2 by curating and annotating genomic information in the dataset by collecting and curating human clinical data to match human patients to KOMP2 mice with severe phenotypes. The goal of this project is to assess pediatric patient cohorts with exome sequencing data and no molecular diagnosis for variants of uncertain significance in genes that correspond to a lethal phenotype in KOMP2 mouse mutant lines. Mouse lines categorized as cellular lethal, developmental lethal or subviable are targeted as relevant for early and severe pediatric phenotypes. For this reason, we will consider four human patient cohorts. The first cohort consists of patients who died within the first year of life. The second cohort consists of patients admitted to the pediatric intensive care units (ICUs) within the first 100 days of life. The third cohort is a recent sample of pediatric patients with trio exome data available. The fourth cohort is a pediatric cohort with likely Mendelian disease genes of unknown function. With each cohort we will identify variants of uncertain significance in human orthologues corresponding to mouse genes classified as cellular lethal, developmental lethal or sub-viable. Then, we will compare the mouse and human phenotypes using standardized phenotype terms to prioritize follow up of genes with variants in our human cohorts and with similar phenotypes in mice and humans.

src/pages/r03/R03OD030598.md

Lines changed: 2 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,8 @@
11
---
22
affilliation: BROWN UNIVERSITY
3-
description: Sex differences in human diseases are well-recognized, but the mechanisms
4-
are not well understood. This gap of knowledge delays the progress in risk assessment
5-
and therapeutic strategies for sex-aware precision healthcare. While studies have
6-
shown significant sex differences in the genetic architectures of complex diseases,
7-
most investigators opted to do sex- combined analyses in disease genetic studies
8-
to maximize statistical power. NIH recently began to reinforce the inclusion of
9-
sex as a biological variable in the design, analysis, and reporting of vertebrate
10-
animal and human studies. Insights into the functional genetic bases of sex as a
11-
biological variable are critical to develop therapeutic interventions that equally
12-
benefit each sex. We recently found that ~1% variants in the population have sex-biased
13-
allele frequency, including ~10% of disease variants in the Genome Aggregation Database
14-
(gnomAD). These variants preferentially occur in tissue-specific sex-differentially
15-
expressed genes. We propose a novel approach to study sex differences in disease
16-
genetic architectures by leveraging variants that are sex-biased either in allele
17-
frequency or phenotypic association. We believe this approach will increase the
18-
statistical power to identify sex-specific or sex interacting causal variants in
19-
sex- biased diseases. We will identify and characterize sex-biased variants in gnomAD,
20-
Genotype- Tissue Expression project (GTEx) and Trans-Omics for Precision Medicine
21-
for sleep disordered breathing phenotypes and venous thromboembolism case-control
22-
datasets. We will subsequently study the functional mechanisms of these sex-biased
23-
variants in ~50 GTEx tissues. The completion of this pilot study will advance future
24-
genetic studies of sex-divergent disorders and accelerate the realization of sex-aware
25-
genomic medicine.
263
end_date: '2022-04-30T12:00:00-04:00'
274
grant_num: R03OD030598
285
pi: SOFER, TAMAR
296
title: Using GTEx to assess the functionality of sex-biased variants
30-
---
7+
---
8+
Sex differences in human diseases are well-recognized, but the mechanisms are not well understood. This gap of knowledge delays the progress in risk assessment and therapeutic strategies for sex-aware precision healthcare. While studies have shown significant sex differences in the genetic architectures of complex diseases, most investigators opted to do sex- combined analyses in disease genetic studies to maximize statistical power. NIH recently began to reinforce the inclusion of sex as a biological variable in the design, analysis, and reporting of vertebrate animal and human studies. Insights into the functional genetic bases of sex as a biological variable are critical to develop therapeutic interventions that equally benefit each sex. We recently found that ~1% variants in the population have sex-biased allele frequency, including ~10% of disease variants in the Genome Aggregation Database (gnomAD). These variants preferentially occur in tissue-specific sex-differentially expressed genes. We propose a novel approach to study sex differences in disease genetic architectures by leveraging variants that are sex-biased either in allele frequency or phenotypic association. We believe this approach will increase the statistical power to identify sex-specific or sex interacting causal variants in sex- biased diseases. We will identify and characterize sex-biased variants in gnomAD, Genotype- Tissue Expression project (GTEx) and Trans-Omics for Precision Medicine for sleep disordered breathing phenotypes and venous thromboembolism case-control datasets. We will subsequently study the functional mechanisms of these sex-biased variants in ~50 GTEx tissues. The completion of this pilot study will advance future genetic studies of sex-divergent disorders and accelerate the realization of sex-aware genomic medicine.

src/pages/r03/R03OD030599.md

Lines changed: 2 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,9 @@
11
---
22
affilliation: UNIVERSITY OF MICHIGAN AT ANN ARBOR
3-
description: "After the completion of the Human Genome Project, several landmarking\
4-
\ consortia have accumulated large amounts of genomic data towards understanding\
5-
\ the functions of human genome. The ENCODE project has annotated genome-wide regulatory\
6-
\ elements. The Roadmap Epigenomic project has characterized tissue-speci\uFB01\
7-
c variation in epigenetic state. The NIH Common Fund GTEx project has delineated\
8-
\ tissue-speci\uFB01c gene expression and transcription regulation. The NIH Common\
9-
\ Fund 4D Nucleome (4DN) project has revealed dynamic 3D chromatin organization\
10-
\ in many cell and tissue types. Each of the aforementioned consortia has generated\
11-
\ thousands or even tens of thousands of datasets, and provided different insights\
12-
\ regarding human genome at an unprecedent scale and depth. However, the datasets\
13-
\ generated from these consortia are isolated in terms of cell types and tissue\
14-
\ types covered, how the data are stored, and the resolution of the genomic data.\
15-
\ These gaps bring realistic data analysis challenges to biomedical researchers\
16-
\ when they use these public datasets jointly in their research \u2014 they need\
17-
\ to go through different data portals with heterogeneous processing pipelines,\
18-
\ different data formats, and unmatched resolutions. We aim to develop the most\
19-
\ cutting-edge deep learning approaches to impute high-resolution chromatin contact\
20-
\ maps, and integrate the high-resolution chromatin contact maps with transcriptional\
21-
\ data available from GTEx project and epigenomic data from ENCODE/Roadmap. We plan\
22-
\ to share the integrated data on a public web server with a multi-panel interactive\
23-
\ visualization genome browser. The integrated data will provide an important resource\
24-
\ for understanding of tissue-speci\uFB01c genetic variation in the light of the\
25-
\ spatial organization of these genomic and epigenomic elements and their functional\
26-
\ implications."
273
end_date: '2022-08-31T12:00:00-04:00'
284
grant_num: R03OD030599
295
pi: LIU, JIE
306
title: A database for high-resolution chromatin contact maps and human genetic variants
317
website: https://github.com/liu-bioinfo-lab/caesar
32-
---
8+
---
9+
After the completion of the Human Genome Project, several landmarking consortia have accumulated large amounts of genomic data towards understanding the functions of human genome. The ENCODE project has annotated genome-wide regulatory elements. The Roadmap Epigenomic project has characterized tissue-specific variation in epigenetic state. The NIH Common Fund GTEx project has delineated tissue-specific gene expression and transcription regulation. The NIH Common Fund 4D Nucleome (4DN) project has revealed dynamic 3D chromatin organization in many cell and tissue types. Each of the aforementioned consortia has generated thousands or even tens of thousands of datasets, and provided different insights regarding human genome at an unprecedent scale and depth. However, the datasets generated from these consortia are isolated in terms of cell types and tissue types covered, how the data are stored, and the resolution of the genomic data. These gaps bring realistic data analysis challenges to biomedical researchers when they use these public datasets jointly in their research — they need to go through different data portals with heterogeneous processing pipelines, different data formats, and unmatched resolutions. We aim to develop the most cutting-edge deep learning approaches to impute high-resolution chromatin contact maps, and integrate the high-resolution chromatin contact maps with transcriptional data available from GTEx project and epigenomic data from ENCODE/Roadmap. We plan to share the integrated data on a public web server with a multi-panel interactive visualization genome browser. The integrated data will provide an important resource for understanding of tissue-specific genetic variation in the light of the spatial organization of these genomic and epigenomic elements and their functional implications.

0 commit comments

Comments
 (0)