Skip to content
Toby Dylan Hocking edited this page Feb 25, 2019 · 16 revisions

Background

BerkeleyDB is a noSQL database that implements a key-value store, which is very efficient and reliable, offering guarantees about atomic transactions, etc.

Related work

The RBerkeley package provided an R interface to BerkeleyDB but it was removed from CRAN in 2017.

Coding project: RBerkeley back on CRAN

It would be useful to have BerkeleyDB on CRAN in order to support applications/packages/algorithms that require disk-based storage. For example PeakSegDisk used to depend on BerkeleyDB STL, which provides an easy-to-use API for on-disk STL containers. PeakSegDisk provides an on-disk implementation of an optimal changepoint detection algorithm for genomic data, which scales to huge data sets because it is not limited by memory. However it was a real pain to get PeakSegDisk to compile on CRAN/win-builder, because they do not provide BerkeleyDB headers/libraries. It would have been great to be able to simply write LinkingTo: RBerkeley in the PeakSegDisk DESCRIPTION file, and be done. However it was easier to just re-write the required functionality in standard C++. The moral of the story is that R needs a package that provides Berkeley DB.

  • address the issues that made CRAN remove BerkeleyDB.
  • setup a git repo with CI / code coverage.
  • either fork https://github.com/hrbrmstr/RBerkeley or continue development there if possible
  • write more tests to increase code coverage.
  • you can use this old version https://github.com/tdhock/PeakSegDisk/commit/190ce1c5e7774f27c38304e43a74cb0d860686c5 of PeakSegDisk as an example of how RBerkeley should/could be used – make a new repo with this code, add LinkingTo: RBerkeley, and use it for testing.
  • add a vignette explaining how to use RBerkeley in R and in C++.

Expected impact

After this GSOC project the RBerkeley package will be back on CRAN, and package developers will be able to build algorithms/functions that take advantage of this powerful library.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

  • Toby Hocking <[email protected]> would be a user of RBerkeley if it was on CRAN.
  • STUDENTS: if you want to do this project, you need to find a second mentor! Best would be if you could email the author of RBerkeley Bob Rudis <[email protected]> and ask him/her to mentor, otherwise try posting on [email protected]

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: download the most recent version of RBerkeley and create an Rmd/html page showing R code and results of how to use RBerkeley.
  • Medium:
  • Hard:

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally