Skip to content

GEO analysis with Shiny

gdancik edited this page Mar 8, 2015 · 9 revisions

GEO Analysis with Shiny

Summary: Developing a web tool using Shiny to analyze gene expression data from the Gene Expression Omnibus (GEO).

Description: The Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) is a public repository of gene expression data. Although GEO has its own tool, called GEO2R, for simple data analysis, statistical and bioinformatics expertise is required for more comprehensive analyses. For example, it is not straightforward to determine whether a single gene is differentially expressed across two groups. This project will involve the development of a web tool, using the R web framework Shiny, to provide users with an interface for analyzing GEO datasets.

Related work: From within R, users can download GEO datasets and extract relevant information using the 'GEOquery' library (http://bioconductor.org/packages/release/bioc/html/GEOquery.html) in order to analyze GEO datasets. The proposed project would implement this within a Shiny framework, providing users access to GEO without requiring knowledge of R.

Potential tasks (to be implemented within a web interface using Shiny):

  • Download a GEO series selected by the user, extract gene expression and phenotypic information
  • Pull out the gene expression values for a desired gene
  • refactor pandoc.table to handle the variety of R objects (named vectors, tables, 2D tables, 3D crosstables and ftable objects) by transforming those to a standard format first instead of the currently active continuous checks and workarounds
  • refactor Pandoc.brew (forked from brew)
  • improve error handling and logging facilities

Skills required: literate programming experiences, so decent markdown and R experience is needed. In more details:

  • Pandoc's markdown syntax and the pandoc command line,
  • previous experience with brew or pander packages,
  • at least a basic git knowledge (e.g. branching) and experience with GitHub.

Test: Fork the package on GH and create a pull-request implementing a method for tables::tabular. Quick and dirty solution:

> library(tables)
> pander(as.matrix(tabular(as.factor(am) ~ (mpg+hp+qsec) * (mean+median), data = mtcars)))

------------- ----- ------ ----- ------ ----- ------
               mpg          hp          qsec        

as.factor(am) mean  median mean  median mean  median

      0       17.15  17.3  160.3  175   18.18 17.82 

      1       24.39  22.8  126.8  109   17.36 17.02 
------------- ----- ------ ----- ------ ----- ------

Please note that pander.tabular should work with any number of variables with even complex table layouts.

Mentor: Gergely Daróczi ([@](mailto:daroczig {at} rapporter {dot} net)) and László Szakács ([@](mailto:cocinerox {at} gmail {dot} com)) as backup mentor

Disclaimer: this proposal was partially covered in 2013 and 2014. Looking forward to work with such talented student again this year.

Clone this wiki locally