Skip to content

Critical efficiency improvements of mcmcse

Akhil Jha edited this page Mar 26, 2021 · 10 revisions

Background

The mcmcse package is the leading package for estimating Monte Carlo standard errors for Markov chain Monte Carlo. Since 2012, it has since expanded to multivariate output analysis methods and the reliable calculation of effective sample size. Functions are often called on massive matrices with rows in the order of millions and columns in the order of thousands. This creates bottlenecks in efficiency. The primary goal of the project is to systematically identify and clear efficiency bottlenecks via detailed benchmarking and testings.

Most of the heavy code is written in C++ using Rcpp. A CRAN hosted version of the package is here and a GitHub development version of the package is here.

Related work

There are a few other packages in R that do univariate effective sample size calculations, the most popular of which is coda. However, coda does not use consistent estimators of the variance, and the variance estimates are known to be liberal. In addition, there is no other package that we know that does multivariate effective sample size calculations.

Details of your coding project

The following would be the primary tasks of the students

  • Improve batchSize() function: for large matrices, this function is a prime bottleneck function and is integral to the smooth functioning of most other functions. The goal is to identify the bottlenecks and see whether it can be moved to Rcpp. This may be challenging, since the function utilizes the already fast ar implementations.
  • Rcpp contains some useful sugar functions for large matrices that should be integrated into the current code. This may require changing from RcppArmadillo to Rcpp in certain situations
  • Calculating determinants and eigenvalues of unstable matrices, produces some numerical instabilities. A series of tests need to be designed to verify whether the package is immune against these instabilities.

Expected impact

The package mcmcse has been dowloaded over 48,000 times and has 106 citations on Google Scholar. Already the package has been found to be useful by the general scientific community, and any and all improvements in the package will continue to benefit this larger community. Additionally, the mcmcse package is soon going to be the foundation for a user-oriented package for Simulation Output Analysis.

Mentors

  • EVALUATING MENTOR: Dootika Vats [email protected] is the author and maintainer of R package mcmcse and a contributor on R package stableGR. She was a GSoC student participant in 2015 for this same package and an expert in MCMC output analysis.
  • James Flegal [email protected] is the founding author of the package and an expert in MCMC output analysis

Tests

Students, please do one or more of the following tests before contacting the mentors above.

MENTORS: write several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project! Please modify the suggestions below to make them specific for your project.

  • Easy: (1) Download the mcmcse package from CRAN and use the function ess on a vector foo of length 1e4 randomly drawn from a standard normal distribution. (2) Make a random matrix of size 10 x 10 and produce only the eigenvalues of the matrix.
  • Medium: Implement a quick profile of the batchSize() function using profvis
  • Hard: Write a code for a random walk Metropolis-Hastings algorithm to sample from a 100 dimensional standard normal Gaussian distribution. Focus on efficient implementation of this code.

Solutions of tests

Students, please post a link to your test results here.

| S No. | STUDENT NAME | GITHUB PROFILE | TEST RESULTS LINK |

| 01 | Akhil Kumar Jha | https://github.com/theakhiljha | https://github.com/theakhiljha/mcmcse_GSOC_2021 |

Clone this wiki locally