Skip to content

rtracklayer improvements

Toby Dylan Hocking edited this page Aug 9, 2019 · 8 revisions

Background

R has many tools for genomic data analysis, but currently is lacking support for (1) generating UCSC track hub meta-data files, and (2) writing bigBed files, which are binary files for displaying genomic regions on track hubs.

Related work

  • PeakSegPipeline has code for generating track hubs, and currently relies on the bedToBigBed command line program for creating bigBed files.
  • rtracklayer can create bigWig but not bigBed files.
  • trackhub is a python module for creating track hub meta-data files.

Details of your coding project

The interested student will work on implementing two major features for the rtracklayer package:

track hub creation

A track hub is a group of text files that describes a set of genomic data to display on the UCSC browser. It contains links to binary indexed files such as bigWig and bigBed. R needs a function for creating such files.

The student should implement functions such as

trackHub(
  multiWig(
    bigWig("http://path/to/data.bigWig", "red"),
    bigWig("http://path/to/peaks.bigWig", "black")),
  bigBed("http://path/to/labels.bigBed"),
  trackDb="trackDb.txt",
  genomes="genomes.txt",
  db="hg19",
  hub="hub.txt")

which would generate trackDb.txt, genomes.txt, and hub.txt which could then be uploaded to a web server for display on UCSC.

bigBed creation

The bigBed file format is useful for displaying genomic regions on UCSC track hubs. The student should implement a BigBedFile class with methods import, export, etc, similar to the existing BigWigFile class.

Expected impact

This project will provide R with functionality for creating track hub meta-data files, along with bigBed files.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

TODO: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.

  • Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
  • Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
  • Hard: Can the student write a package with Rd files, tests, and vignettes? If your package interfaces with non-R code, can the student write in that other language?

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally