Skip to content

Commit b1b505a

Browse files
authored
DOC improve logreg example with figure and timings (#228)
1 parent 700c280 commit b1b505a

File tree

5 files changed

+80
-23
lines changed

5 files changed

+80
-23
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,16 @@ celer.egg-info
1414
# build
1515
build
1616

17+
# generated doc
18+
doc/_build/
19+
doc/auto_examples/
20+
doc/generated
1721

1822
# cache
1923
.pytest_cache
2024
__pycache__
2125

22-
doc/*
26+
2327
coverage/*
2428
.coverage
2529

README.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,17 @@ celer
33

44
|image0| |image1|
55

6-
Fast algorithm to solve Lasso-like problems with dual extrapolation. Currently, the package handles the following problems:
6+
Fast algorithm to solve Lasso-like problems with dual extrapolation, under a scikit-learn API.
7+
The solvers used allow for solving large scale problems with millions of features, up to 100 times faster than scikit-learn.
8+
Currently, the package handles the following problems:
79

810
- Lasso
911
- weighted Lasso
1012
- Sparse Logistic regression
11-
- Group Lasso
13+
- weighted Group Lasso
1214
- Multitask Lasso
1315

1416
The estimators follow the scikit-learn API, come with automated parallel cross-validation, and support both sparse and dense data, with optionally feature centering, normalization, and unpenalized intercept fitting.
15-
The solvers used allow for solving large scale problems with millions of features, up to 100 times faster than scikit-learn.
1617

1718
Documentation
1819
=============

doc/conf.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -192,14 +192,21 @@
192192
'Miscellaneous'),
193193
]
194194

195+
196+
intersphinx_mapping = {
197+
# 'numpy': ('https://docs.scipy.org/doc/numpy/', None),
198+
# 'scipy': ('https://docs.scipy.org/doc/scipy/reference', None),
199+
'matplotlib': ('https://matplotlib.org/', None),
200+
'sklearn': ('http://scikit-learn.org/stable', None),
201+
}
202+
195203
sphinx_gallery_conf = {
196-
'doc_module': ('celer',),
204+
'doc_module': ('celer', 'sklearn'),
197205
'reference_url': dict(celer=None),
198206
'examples_dirs': '../examples',
199207
'gallery_dirs': 'auto_examples',
200208
'reference_url': {
201-
'numpy': 'http://docs.scipy.org/doc/numpy-1.9.1',
202-
'scipy': 'http://docs.scipy.org/doc/scipy-0.17.0/reference',
209+
'celer': None,
203210
}
204211
}
205212

doc/index.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,19 @@
66
Celer
77
======
88

9-
This is a library to run the Constraint Elimination for the Lasso with Extrapolated Residuals (Celer) algorithm [1].
9+
Celer is a library exposing many scikit-learn like sparse models missing from scikit-learn.
10+
It estimates these models with the Constraint Elimination for the Lasso with Extrapolated Residuals (Celer) algorithm [1].
11+
The solvers used allow for solving large scale problems with millions of features, **up to 100 times faster than scikit-learn**.
12+
1013
Currently, the package handles the following problems:
1114

1215
- Lasso
1316
- weighted Lasso
14-
- Sparse Logistic regression
15-
- Group Lasso
17+
- sparse Logistic regression
18+
- weighted Group Lasso
1619
- Multitask Lasso.
1720

1821
The estimators follow the scikit-learn API, come with automated parallel cross-validation, and support both sparse and dense data, with optionally feature centering, normalization, and unpenalized intercept fitting.
19-
The solvers used allow for solving large scale problems with millions of features, up to 100 times faster than scikit-learn.
2022

2123

2224
Install the released version

examples/plot_logreg_timings.py

Lines changed: 55 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,67 @@
11
"""
2-
===============================================================
3-
Use LogisticRegression class with Celer and Prox-Newton solvers
4-
===============================================================
2+
==================================================================
3+
Compare LogisticRegression solver with sklearn's liblinear backend
4+
==================================================================
55
"""
66

7+
import time
8+
import warnings
79
import numpy as np
810
from numpy.linalg import norm
11+
import matplotlib.pyplot as plt
912
from sklearn import linear_model
13+
from libsvmdata import fetch_libsvm
1014

1115
from celer import LogisticRegression
12-
from celer.datasets import fetch_ml_uci
1316

14-
dataset = "gisette_train"
15-
X, y = fetch_ml_uci(dataset)
17+
warnings.filterwarnings("ignore", message="Objective did not converge")
18+
warnings.filterwarnings("ignore", message="Liblinear failed to converge")
19+
20+
X, y = fetch_libsvm("news20.binary")
1621

1722
C_min = 2 / norm(X.T @ y, ord=np.inf)
18-
C = 5 * C_min
19-
clf = LogisticRegression(C=C, verbose=1, solver="celer-pn", tol=1e0).fit(X, y)
20-
w_celer = clf.coef_.ravel()
23+
C = 20 * C_min
24+
25+
26+
def pobj_logreg(w):
27+
return np.sum(np.log(1 + np.exp(-y * (X @ w)))) + 1. / C * norm(w, ord=1)
28+
29+
30+
pobj_celer = []
31+
t_celer = []
32+
33+
for n_iter in range(10):
34+
t0 = time.time()
35+
clf = LogisticRegression(
36+
C=C, solver="celer-pn", max_iter=n_iter, tol=0).fit(X, y)
37+
t_celer.append(time.time() - t0)
38+
w_celer = clf.coef_.ravel()
39+
pobj_celer.append(pobj_logreg(w_celer))
40+
41+
pobj_celer = np.array(pobj_celer)
42+
43+
44+
pobj_libl = []
45+
t_libl = []
46+
47+
for n_iter in np.arange(0, 50, 10):
48+
t0 = time.time()
49+
clf = linear_model.LogisticRegression(
50+
C=C, solver="liblinear", penalty='l1', fit_intercept=False,
51+
max_iter=n_iter, random_state=0, tol=1e-10).fit(X, y)
52+
t_libl.append(time.time() - t0)
53+
w_libl = clf.coef_.ravel()
54+
pobj_libl.append(pobj_logreg(w_libl))
55+
56+
pobj_libl = np.array(pobj_libl)
57+
58+
p_star = min(pobj_celer.min(), pobj_libl.min())
2159

22-
clf = linear_model.LogisticRegression(
23-
C=C, solver="liblinear", penalty='l1', fit_intercept=False).fit(X, y)
24-
w_lib = clf.coef_.ravel()
60+
plt.close("all")
61+
fig = plt.figure(figsize=(4, 2), constrained_layout=True)
62+
plt.semilogy(t_celer, pobj_celer - p_star, label="Celer-PN")
63+
plt.semilogy(t_libl, pobj_libl - p_star, label="liblinear")
64+
plt.legend()
65+
plt.xlabel("Time (s)")
66+
plt.ylabel("objective suboptimality")
67+
plt.show(block=False)

0 commit comments

Comments
 (0)