-
Notifications
You must be signed in to change notification settings - Fork 7
Better solvers for the SLOPE package
The SLOPE package provides implementations for Sorted L-One Penalized Estimation (SLOPE): generalized linear models regularized with the sorted L1-norm (Bogdan et al. (2015) https://doi.org/10/gfgwzt).
SLOPE is an extension of the lasso that gracefully handles heavily correlated predictors, which makes it a very attractive option for high-dimensional designs with with correlated predictors. Because the penalty term in SLOPE is non-separable and requires a p log(p) sorting operation, however, current implementation of SLOPE are typically considerably slower than those for the Lasso.
- The owl package is a package that also implements SLOPE models. Functionality in this package is soon to be merged into the SLOPE package.
- PNOPT is a matlab implementation of a (quasi) proximal Newton solver
- stanford.edu/~boyd/admm.html features several implementations of matlab and C implementations of ADMM.
- https://github.com/yixuan/ADMM feature extremely efficient ADMM implementations for lasso and other objectives using Eigen
- Bogdan, M., van den Berg, E., Sabatti, C., Su, W., & Candès, E. J. (2015). SLOPE -- adaptive variable selection via convex optimization. The Annals of Applied Statistics, 9(3), 1103–1140. https://doi.org/10/gfgwzt
- Zeng, X., & Figueiredo, M. A. T. (2014). The atomic norm formulation of OSCAR regularization with application to the Frank-Wolfe algorithm. 2014 22nd European Signal Processing Conference (EUSIPCO), 780–784. https://ieeexplore.ieee.org/document/6952255
- Lee, J. D., Sun, Y., & Saunders, M. A. (2014). Proximal Newton-type methods for minimizing composite functions. ArXiv:1206.1623 [Cs, Math, Stat]. http://arxiv.org/abs/1206.1623
- Boyd, S. (2010). Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends® in Machine Learning, 3(1), 1–122. https://doi.org/10.1561/2200000016
The primary goal of this project is to improve the performance of the the SLOPE package by testing and implementing new solvers for logistic regression, poisson regression, and multinomial logistic regression. At the time of writing, the SLOPE package uses FISTA to solve these problems, while an ADMM (alternating direction methods of multipliers) solver has been implemented for ordinary least squares regression (in the development version). FISTA, however, converges slowly when predictors are correlated (which is exactly the problem that SLOPE is intended to solve!), so there is ample motivation for implementing more efficient solvers.
- setup a test suite for benchmarking numerical solvers for solving SLOPE as a stand-alone R package using RcppArmadillo
- implement solvers (in the test suite) for logistic regression, poisson regression, and multinomial regression, at the minimum including FISTA, ADMM, and proximal Newton
- compare solvers using the test suite
- implement the solvers in the SLOPE package
- supplement documentation in the SLOPE package to include information regarding the chosen implementation
- prepare and submit an update to CRAN
- extend the functionality of the package to include multivariate gaussian regression
- extend the multinomial logistic regression model to allow a group SLOPE penalty
SLOPE is a promising method for model selection in a high-dimensional setting, that is gaining interest in the statistical community. Since the discoveries are relatively new we expect a much larger usage in the future. A major bottle neck, at the moment, for wider applicability of the method is, however, the availability of computationally efficienct packages. The SLOPE package features the currently most efficient implementation for this method, but it is slow when it comes to handling correlated predictors since several of the models use FISTA to solve the problem.
- EVALUATING MENTOR: Johan Larsson ([email protected]) is a PhD student in statistics at the Department of statistics, Lund University and author of R packages eulerr and qualpalr and will become maintainer of the SLOPE package. Johan was mentor for the R GSOC project sgdnet 2019 and student for the same project 2018.
- Jonas Wallin ([email protected]) is an assistant professor at the department of statistics, Lund University. PhD in mathematical statistics.
Students, please contact mentors above after completing at least one of the tests below.
Students, please do one or more of the following tests before contacting the mentors above.
- Easy: download the development version of the R package SLOPE (
devtools::install_github("jolars/SLOPE")). Fit SLOPE and lasso (hint: see thelambdaargument inSLOPE()) models using the SLOPE package to theabalonedata set that comes with SLOPE. Plot the results. What are the similarities and differences? - Medium: write a function using RcppArmadillo that computes the proximal operator for SLOPE using Algorithm 3 (FastProxSL1) from Bogdan et al 2015 (SLOPE: adaptive variable selection via convex optimization). Compare the result with
SLOPE:::prox_sorted_L1()(observe that this function uses a different algorithm than the one you are supposed to implement) - Hard: write an R package using RcppArmadillo (as a backend) that uses FISTA or ADMM to solve ordinary least squares regression using SLOPE. Make use of the function to compute the proximal operator that you implemented in the previous test.
Students, please post a link to your test results here.
- Akarsh Goyal, Github Profile , Test Results