Skip to content

Commit 686c3ea

Browse files
authored
Merge pull request #50 from simai-ml/cv_prefit
Cv prefit
2 parents 17271bb + 572935c commit 686c3ea

File tree

14 files changed

+513
-171
lines changed

14 files changed

+513
-171
lines changed

.appveyor.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ install:
2020
- conda activate test-env
2121

2222
test_script:
23-
- mypy mapie examples --strict --config-file mypy.ini
23+
- mypy mapie examples --strict
2424
- pytest -vs --doctest-modules --cov-branch --cov=mapie --pyargs mapie
2525

2626
after_test:

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Please describe the tests that you ran to verify your changes. Provide instructi
2525
- [ ] I have read the [contributing guidelines](https://github.com/simai-ml/MAPIE/blob/master/CONTRIBUTING.rst)
2626
- [ ] I have updated the [HISTORY.rst](https://github.com/simai-ml/MAPIE/blob/master/HISTORY.rst) and [AUTHORS.rst](https://github.com/simai-ml/MAPIE/blob/master/AUTHORS.rst) files
2727
- [ ] Linting passes successfully : `flake8 . --exclude=doc`
28-
- [ ] Typing passes successfully : `mypy mapie examples --strict --config-file mypy.ini`
28+
- [ ] Typing passes successfully : `mypy mapie examples --strict`
2929
- [ ] Unit tests pass successfully : `pytest -vs --doctest-modules mapie`
3030
- [ ] Coverage is 100% : `pytest -vs --doctest-modules --cov-branch --cov=mapie --pyargs mapie`
3131
- [ ] Documentation builds successfully : `cd doc; make clean; make html`

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ install:
2222
- conda activate test-env
2323

2424
script:
25-
- mypy mapie examples --strict --config-file mypy.ini
25+
- mypy mapie examples --strict
2626
- pytest -vs --doctest-modules --cov-branch --cov=mapie --pyargs mapie
2727

2828
after_success:

HISTORY.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@
22
History
33
=======
44

5-
0.2.1 (2020-XX-XX)
5+
0.2.1 (2021-XX-XX)
66
------------------
77

8+
* Add `cv="prefit"` option
89
* Add sample_weight argument in fit method
910

1011
0.2.0 (2021-05-21)
@@ -16,8 +17,7 @@ History
1617
* Remove the `n_splits`, `shuffle` and `random_state` parameters
1718
* Simplify the `method` parameter
1819
* Fix typos in documentation and add methods descriptions in sphinx
19-
* Accept alpha parameter as a list or np.ndarray
20-
* If alpha is an Iterable, `.predict()` returns a np.ndarray of shape (n_samples, 3, len(alpha))
20+
* Accept alpha parameter as a list or np.ndarray. If alpha is an Iterable, `.predict()` returns a np.ndarray of shape (n_samples, 3, len(alpha)).
2121

2222
0.1.4 (2021-05-07)
2323
------------------

doc/images/quickstart_1.png

0 Bytes
Loading

doc/tutorial.rst

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ over :math:`x`.
6969
x_sinx, min_x, max_x, n_samples, noise
7070
)
7171
72-
Let"s visualize our noisy function.
72+
Let's visualize our noisy function.
7373

7474
.. code:: python
7575
@@ -494,11 +494,14 @@ uniform distribution.
494494
.. image:: images/tuto_7.png
495495
:align: center
496496

497-
Let"s then define the models. The boosing model considers 100 shallow trees with a max depth of 2 while
497+
Let's then define the models. The boosing model considers 100 shallow trees with a max depth of 2 while
498498
the Multilayer Perceptron has two hidden dense layers with 20 neurons each followed by a relu activation.
499499

500500
.. code:: python
501501
502+
from tensorflow.keras import Sequential
503+
from tensorflow.keras.layers import Dense
504+
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
502505
def mlp():
503506
"""
504507
Two-layer MLP model
@@ -519,6 +522,8 @@ the Multilayer Perceptron has two hidden dense layers with 20 neurons each follo
519522
("linear", LinearRegression(fit_intercept=False))
520523
]
521524
)
525+
526+
from xgboost import XGBRegressor
522527
xgb_model = XGBRegressor(
523528
max_depth=2,
524529
n_estimators=100,
@@ -534,7 +539,7 @@ the Multilayer Perceptron has two hidden dense layers with 20 neurons each follo
534539
verbose=0
535540
)
536541
537-
Let"s now use MAPIE to estimate the prediction intervals using the CV+ method
542+
Let's now use MAPIE to estimate the prediction intervals using the CV+ method
538543
and compare their prediction interval.
539544

540545
.. code:: python

examples/plot_homoscedastic_1d_data.py

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@
2222

2323
def f(x: np.ndarray) -> np.ndarray:
2424
"""Polynomial function used to generate one-dimensional data"""
25-
return np.stack(5*x + 5*x**4 - 9*x**2)
25+
return np.array(5*x + 5*x**4 - 9*x**2)
2626

2727

2828
def get_homoscedastic_data(
29-
n_samples: int = 200,
30-
n_test: int = 1000,
29+
n_train: int = 200,
30+
n_true: int = 200,
3131
sigma: float = 0.1
3232
) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, float]:
3333
"""
@@ -38,9 +38,9 @@ def get_homoscedastic_data(
3838
3939
Parameters
4040
----------
41-
n_samples : int, optional
41+
n_train : int, optional
4242
Number of training samples, by default 200.
43-
n_test : int, optional
43+
n_true : int, optional
4444
Number of test samples, by default 1000.
4545
sigma : float, optional
4646
Standard deviation of noise, by default 0.1
@@ -57,9 +57,9 @@ def get_homoscedastic_data(
5757
"""
5858
np.random.seed(59)
5959
q95 = scipy.stats.norm.ppf(0.95)
60-
X_train = np.random.exponential(0.4, n_samples)
61-
X_true = np.linspace(0.001, 1.2, n_test, endpoint=False)
62-
y_train = f(X_train) + np.random.normal(0, sigma, n_samples)
60+
X_train = np.linspace(0, 1, n_train)
61+
X_true = np.linspace(0, 1, n_true)
62+
y_train = f(X_train) + np.random.normal(0, sigma, n_train)
6363
y_true = f(X_true)
6464
y_true_sigma = q95*sigma
6565
return X_train, y_train, X_true, y_true, y_true_sigma
@@ -106,7 +106,7 @@ def plot_1d_data(
106106
"""
107107
ax.set_xlabel("x")
108108
ax.set_ylabel("y")
109-
ax.set_xlim([0, 1.1])
109+
ax.set_xlim([0, 1])
110110
ax.set_ylim([0, 1])
111111
ax.scatter(X_train, y_train, color="red", alpha=0.3, label="training")
112112
ax.plot(X_test, y_test, color="gray", label="True confidence intervals")
@@ -118,16 +118,12 @@ def plot_1d_data(
118118
ax.legend()
119119

120120

121-
X_train, y_train, X_test, y_test, y_test_sigma = get_homoscedastic_data(
122-
n_samples=200, n_test=200, sigma=0.1
123-
)
121+
X_train, y_train, X_test, y_test, y_test_sigma = get_homoscedastic_data()
124122

125-
polyn_model = Pipeline(
126-
[
127-
("poly", PolynomialFeatures(degree=4)),
128-
("linear", LinearRegression(fit_intercept=False))
129-
]
130-
)
123+
polyn_model = Pipeline([
124+
("poly", PolynomialFeatures(degree=4)),
125+
("linear", LinearRegression(fit_intercept=False))
126+
])
131127

132128
Params = TypedDict("Params", {"method": str, "cv": int})
133129
STRATEGIES = {

examples/plot_nested-cv.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@
5050
from sklearn.model_selection import train_test_split
5151
from sklearn.model_selection import RandomizedSearchCV
5252
from sklearn.metrics import mean_squared_error
53+
5354
from mapie.estimators import MapieRegressor
5455
from mapie.metrics import coverage_score
5556

examples/plot_prefit_nn.py

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
"""
2+
========================================================
3+
Example use of the prefit parameter with neural networks
4+
========================================================
5+
6+
:class:`mapie.estimators.MapieRegressor` is used to calibrate
7+
uncertainties for large models for which the cost of cross-validation
8+
is too high. Typically, neural networks rely on a single validation set.
9+
10+
In this example, we first fit a neural network on the training set. We
11+
then compute residuals on a validation set with the `cv="prefit"` parameter.
12+
Finally, we evaluate the model with prediction intervals on a testing set.
13+
"""
14+
import scipy
15+
import numpy as np
16+
from sklearn.model_selection import train_test_split
17+
from sklearn.neural_network import MLPRegressor
18+
from matplotlib import pyplot as plt
19+
20+
from mapie.estimators import MapieRegressor
21+
from mapie.metrics import coverage_score
22+
23+
24+
def f(x: np.ndarray) -> np.ndarray:
25+
"""Polynomial function used to generate one-dimensional data."""
26+
return np.array(5*x + 5*x**4 - 9*x**2)
27+
28+
29+
# Generate data
30+
sigma = 0.1
31+
n_samples = 10000
32+
X = np.linspace(0, 1, n_samples)
33+
y = f(X) + np.random.normal(0, sigma, n_samples)
34+
35+
# Train/validation/test split
36+
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=1/10)
37+
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=1/9)
38+
39+
# Train model on training set
40+
model = MLPRegressor(activation="relu", random_state=1)
41+
model.fit(X_train.reshape(-1, 1), y_train)
42+
43+
# Calibrate uncertainties on validation set
44+
alpha = 0.1
45+
mapie = MapieRegressor(model, alpha=alpha, cv="prefit")
46+
mapie.fit(X_val.reshape(-1, 1), y_val)
47+
48+
# Evaluate prediction and coverage level on testing set
49+
y_pred, y_pred_low, y_pred_up = mapie.predict(X_test.reshape(-1, 1))[:, :, 0].T
50+
coverage = coverage_score(y_test, y_pred_low, y_pred_up)
51+
52+
# Plot obtained prediction intervals on testing set
53+
theoretical_semi_width = scipy.stats.norm.ppf(1 - alpha)*sigma
54+
y_test_theoretical = f(X_test)
55+
order = np.argsort(X_test)
56+
57+
plt.scatter(X_test, y_test, color="red", alpha=0.3, label="testing", s=2)
58+
plt.plot(X_test[order], y_test_theoretical[order], color="gray", label="True confidence intervals")
59+
plt.plot(X_test[order], y_test_theoretical[order] - theoretical_semi_width, color="gray", ls="--")
60+
plt.plot(X_test[order], y_test_theoretical[order] + theoretical_semi_width, color="gray", ls="--")
61+
plt.plot(X_test[order], y_pred[order], label="Prediction intervals")
62+
plt.fill_between(X_test[order], y_pred_low[order], y_pred_up[order], alpha=0.2)
63+
plt.title(
64+
f"Target and effective coverages for alpha={alpha}: ({1 - alpha:.3f}, {coverage:.3f})"
65+
)
66+
plt.xlabel("x")
67+
plt.ylabel("y")
68+
plt.legend()
69+
plt.show()

examples/plot_toy_model.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from matplotlib import pyplot as plt
1111
from sklearn.linear_model import LinearRegression
1212
from sklearn.datasets import make_regression
13+
1314
from mapie.estimators import MapieRegressor
1415
from mapie.metrics import coverage_score
1516

@@ -31,7 +32,7 @@
3132
plt.fill_between(X[order].ravel(), y_preds[:, 1, 0][order].ravel(), y_preds[:, 2, 0][order].ravel(), alpha=0.2)
3233
coverage_scores = [coverage_score(y, y_preds[:, 1, i], y_preds[:, 2, i]) for i, _ in enumerate(alpha)]
3334
plt.title(
34-
f"Target and effective coverages for alpha={alpha[0]:.2f}: ({1-alpha[0]:.3f}, {coverage_scores[0]:.3f})\n" +
35+
f"Target and effective coverages for alpha={alpha[0]:.2f}: ({1-alpha[0]:.3f}, {coverage_scores[0]:.3f})\n"
3536
f"Target and effective coverages for alpha={alpha[1]:.2f}: ({1-alpha[1]:.3f}, {coverage_scores[1]:.3f})"
3637
)
3738
plt.show()

0 commit comments

Comments
 (0)