Skip to content

Stochastic Average Gradient

Toby Dylan Hocking edited this page Feb 13, 2015 · 10 revisions

Background

Mark Schmidt proposed the Stochastic Average Gradient (SAG) algorithm as a fast solver for smooth convex optimization problems on finite data sets. His C/MATLAB code implements SAG for L2-regularized logistic regression. L2-regularized logistic regression is a convex optimization problem that is explained in detail in Chapter 4 of Elements of Statistical Learning.

Related work

Project ideas

Write the SAG R package:

  • Convert Mark's C code with "mex.h" headers to C code with "R.h" headers, for the three SAG methods (SAG, SAGlineSearch, SAG_LipshitzLS).
  • Convert Mark's example data sets (rcv1_train.binary.mat and covtype.libsvm.binary.mat) to .RData format.
  • Convert Mark's documentation comments in C code to .Rd files, possibly generated by inlinedocs, etc.
  • Examples/vignettes on these data sets that
    • show how these 3 solvers can be used,
    • compare with the results of glmnet/optimx.
  • Tests that make sure the R package
    • gets the right answer (gradient with norm close to zero).
    • gets the same answer as glmnet/optimx.

Skills required

R package and C code development.

Mentor

Please get in touch with John Nash [email protected] and Toby Dylan Hocking [email protected] as soon as possible.

Tests

After completing your tests, please post a link to your files below.

  • Easy: use glmnet to fit an L2-regularized logistic regression model. Use the system.time function to record how much time it takes for several data set sizes, and make a plot that shows how execution time depends on the data set size.
  • Medium: create a simple R package with one function and one documentation file, and upload it to your GitHub account.
  • Hard: Write an R package which uses .C to interface C code.
Clone this wiki locally