-
Notifications
You must be signed in to change notification settings - Fork 29
Stochastic Average Gradient
Toby Dylan Hocking edited this page Feb 13, 2015
·
10 revisions
Mark Schmidt proposed the Stochastic Average Gradient (SAG) algorithm as a fast solver for smooth convex optimization problems on finite data sets. His C/MATLAB code implements SAG for L2-regularized logistic regression. L2-regularized logistic regression is a convex optimization problem that is explained in detail in Chapter 4 of Elements of Statistical Learning.
- glmnet(family="binomial", alpha=0) uses coordinate descent to solve L2-regularized logistic regression.
- optimx is a general function optimizer which could be used to solve L2-regularized logistic regression.
Fork https://github.com/tdhock/SAG and write the SAG R package:
- Convert Mark's C code with "mex.h" headers to C code with "R.h" headers, for the three SAG methods (SAG, SAGlineSearch, SAG_LipshitzLS).
- Convert Mark's documentation comments in C code to .Rd files, possibly generated by inlinedocs, etc.
- Examples/vignettes using Mark's rcv1_train and covtype.libsvm data sets that
- show how these 3 solvers can be used,
- compare with the results of glmnet/optimx.
- Tests that make sure the R package
- gets the right answer (gradient with norm close to zero).
- gets the same answer as glmnet/optimx.
R package and C code development.
Please get in touch with John Nash [email protected] and Toby Dylan Hocking [email protected] as soon as possible.
After completing your tests, please post a link to your files below.
- Easy: use
glmnetto fit an L2-regularized logistic regression model. Use thesystem.timefunction to record how much time it takes for several data set sizes, and make a plot that shows how execution time depends on the data set size. - Medium: create a simple R package with one function and one documentation file, and upload it to your GitHub account.
- Harder: Write an R package which uses
.Cto interface C code.