@@ -11,15 +11,12 @@ can be modified to combine supervised and unsupervised learning, in a formulatio
11
11
denoted `PCov-CUR ` and `PCov-FPS `.
12
12
For further reading, refer to [Imbalzano2018 ]_ and [Cersonsky2021 ]_.
13
13
14
-
15
14
These selectors can be used for both feature and sample selection, with similar
16
- instantiations. Currently, all sub-selection methods extend :py:class: `GreedySelector `,
17
- where at each iteration the model scores each
18
- feature or sample (without an estimator) and chooses that with the maximum score.
19
- This can be executed using:
15
+ instantiations. This can be executed using:
20
16
21
17
.. doctest ::
22
18
19
+ >>> # feature selection
23
20
>>> import numpy as np
24
21
>>> from skmatter.feature_selection import CUR , FPS , PCovCUR, PCovFPS
25
22
>>> selector = CUR(
@@ -36,10 +33,14 @@ This can be executed using:
36
33
... # are exhausted
37
34
... full= False ,
38
35
... )
39
- >>> X = np.array([[ 0.12 , 0.21 , 0.02 ], # 3 samples, 3 features
40
- ... [- 0.09 , 0.32 , - 0.10 ],
41
- ... [- 0.03 , - 0.53 , 0.08 ]])
42
- >>> y = np.array([0 ., 0 ., 1 .]) # classes of each sample
36
+ >>> X = np.array(
37
+ ... [
38
+ ... [0.12 , 0.21 , 0.02 ], # 3 samples, 3 features
39
+ ... [- 0.09 , 0.32 , - 0.10 ],
40
+ ... [- 0.03 , - 0.53 , 0.08 ],
41
+ ... ]
42
+ ... )
43
+ >>> y = np.array([0.0 , 0.0 , 1.0 ]) # classes of each sample
43
44
>>> selector.fit(X)
44
45
CUR(n_to_select=2, progress_bar=True, score_threshold=1e-12)
45
46
>>> Xr = selector.transform(X)
@@ -51,6 +52,8 @@ This can be executed using:
51
52
>>> Xr = selector.transform(X)
52
53
>>> print (Xr.shape)
53
54
(3, 2)
55
+ >>>
56
+ >>> # Now sample selection
54
57
>>> from skmatter.sample_selection import CUR , FPS , PCovCUR, PCovFPS
55
58
>>> selector = CUR(n_to_select = 2 )
56
59
>>> selector.fit(X)
@@ -59,23 +62,11 @@ This can be executed using:
59
62
>>> print (Xr.shape)
60
63
(2, 3)
61
64
62
- where `Selector ` is one of the classes below that overwrites the method
63
- :py:func: `score `.
64
-
65
- From :py:class: `GreedySelector `, selectors inherit these public methods:
66
-
67
- .. currentmodule :: skmatter._selection
68
-
69
- .. class :: GreedySelector
70
-
71
- .. automethod :: fit
72
- .. automethod :: transform
73
- .. automethod :: get_support
74
65
75
66
.. _CUR-api :
76
67
77
68
CUR
78
- ###
69
+ ---
79
70
80
71
81
72
CUR decomposition begins by approximating a matrix :math: `{\mathbf {X}}` using a subset
@@ -100,88 +91,50 @@ features in a single iteration based upon the relative :math:`\pi` importance.
100
91
The feature and sample selection versions of CUR differ only in the computation of
101
92
:math: `\pi `. In sample selection :math: `\pi ` is computed using the left singular
102
93
vectors, versus in feature selection, :math: `\pi ` is computed using the right singular
103
- vectors. In addition to :py:class: `GreedySelector `, both instances of CUR selection
104
- build off of :py:class: `skmatter._selection._cur._CUR `, and inherit
105
-
106
- .. currentmodule :: skmatter._selection
107
-
108
- .. automethod :: _CUR.score
109
- .. automethod :: _CUR._compute_pi
110
-
111
- They are instantiated using
112
- :py:class: `skmatter.feature_selection.CUR ` and
113
- :py:class: `skmatter.sample_selection.CUR `, e.g.
114
-
115
- .. code-block :: python
94
+ vectors.
116
95
117
- from skmatter.feature_selection import CUR
96
+ .. autoclass :: skmatter.feature_selection.CUR
97
+ :members:
98
+ :private-members: _compute_pi
99
+ :undoc-members:
100
+ :inherited-members:
118
101
119
- selector = CUR(
120
- n_to_select = 4 ,
121
- progress_bar = True ,
122
- score_threshold = 1e-12 ,
123
- full = False ,
124
- # int, number of eigenvectors to use in computing pi
125
- k = 1 ,
126
- # int, number of steps after which to recompute pi
127
- recompute_every = 1 ,
128
- # float, threshold below which scores will be considered 0, defaults to 1E-12
129
- tolerance = 1e-12 ,
130
- )
131
- selector.fit(X)
132
-
133
- Xr = selector.transform(X)
102
+ .. autoclass :: skmatter.sample_selection.CUR
103
+ :members:
104
+ :private-members: _compute_pi
105
+ :undoc-members:
106
+ :inherited-members:
134
107
135
108
.. _PCov-CUR-api :
136
109
137
110
PCov-CUR
138
- ########
111
+ --------
139
112
140
113
PCov-CUR extends upon CUR by using augmented right or left singular vectors inspired by
141
114
Principal Covariates Regression, as demonstrated in [Cersonsky2021 ]_. These methods
142
115
employ the modified kernel and covariance matrices introduced in :ref: `PCovR-api ` and
143
116
available via the Utility Classes.
144
117
145
118
Again, the feature and sample selection versions of PCov-CUR differ only in the
146
- computation of :math: `\pi `. So, in addition to :py:class: `GreedySelector `, both
147
- instances of PCov-CUR selection build off of
148
- :py:class: `skmatter._selection._cur._PCovCUR `, inheriting
149
-
150
- .. currentmodule :: skmatter._selection
151
-
152
- .. automethod :: _PCovCUR.score
153
- .. automethod :: _PCovCUR._compute_pi
154
-
155
- and are instantiated using
156
- :py:class: `skmatter.feature_selection.PCovCUR ` and :py:class: `skmatter.sample_selection.PCovCUR `.
157
-
158
- .. code-block :: python
119
+ computation of :math: `\pi `. S
159
120
160
- from skmatter.feature_selection import PCovCUR
121
+ .. autoclass :: skmatter.feature_selection.PCovCUR
122
+ :members:
123
+ :private-members: _compute_pi
124
+ :undoc-members:
125
+ :inherited-members:
161
126
162
- selector = PCovCUR(
163
- n_to_select = 4 ,
164
- progress_bar = True ,
165
- score_threshold = 1e-12 ,
166
- full = False ,
167
- # float, default=0.5
168
- # The PCovR mixing parameter, as described in PCovR as alpha
169
- mixing = 0.5 ,
170
- # int, number of eigenvectors to use in computing pi
171
- k = 1 ,
172
- # int, number of steps after which to recompute pi
173
- recompute_every = 1 ,
174
- # float, threshold below which scores will be considered 0, defaults to 1E-12
175
- tolerance = 1e-12 ,
176
- )
177
- selector.fit(X, y)
127
+ .. autoclass :: skmatter.sample_selection.PCovCUR
128
+ :members:
129
+ :private-members: _compute_pi
130
+ :undoc-members:
131
+ :inherited-members:
178
132
179
- Xr = selector.transform(X)
180
133
181
134
.. _FPS-api :
182
135
183
136
Farthest Point-Sampling (FPS)
184
- #############################
137
+ -----------------------------
185
138
186
139
Farthest Point Sampling is a common selection technique intended to exploit the
187
140
diversity of the input space.
@@ -194,116 +147,53 @@ distance, however other distance metrics may be employed.
194
147
Similar to CUR, the feature and selection versions of FPS differ only in the way
195
148
distance is computed (feature selection does so column-wise, sample selection does so
196
149
row-wise), and are built off of the same base class,
197
- :py:class: `skmatter._selection._fps._FPS `, in addition to GreedySelector, and inherit
198
-
199
- .. currentmodule :: skmatter._selection
200
-
201
- .. automethod :: _FPS.score
202
- .. automethod :: _FPS.get_distance
203
- .. automethod :: _FPS.get_select_distance
204
150
205
151
These selectors can be instantiated using :py:class: `skmatter.feature_selection.FPS ` and
206
152
:py:class: `skmatter.sample_selection.FPS `.
207
153
208
- .. code-block :: python
209
-
210
- from skmatter.feature_selection import FPS
211
154
212
- selector = FPS(
213
- n_to_select = 4 ,
214
- progress_bar = True ,
215
- score_threshold = 1e-12 ,
216
- full = False ,
217
- # int or 'random', default=0
218
- # Index of the first selection.
219
- # If ‘random’, picks a random value when fit starts.
220
- initialize = 0 ,
221
- )
222
- selector.fit(X)
155
+ .. autoclass :: skmatter.feature_selection.FPS
156
+ :members:
157
+ :undoc-members:
158
+ :inherited-members:
223
159
224
- Xr = selector.transform(X)
160
+ .. autoclass :: skmatter.sample_selection.FPS
161
+ :members:
162
+ :undoc-members:
163
+ :inherited-members:
225
164
226
165
.. _PCov-FPS-api :
227
166
228
167
PCov-FPS
229
- ########
168
+ --------
230
169
231
170
PCov-FPS extends upon FPS much like PCov-CUR does to CUR. Instead of using the Euclidean
232
171
distance solely in the space of :math: `\mathbf {X}`, we use a combined distance in terms
233
172
of :math: `\mathbf {X}` and :math: `\mathbf {y}`.
234
173
235
- Again, the feature and sample selection versions of PCov-FPS differ only in computing
236
- the distances. So, in addition to :py:class: `GreedySelector `, both instances of PCov-FPS
237
- selection build off of :py:class: `skmatter._selection._fps._PCovFPS `, and inherit
174
+ .. autoclass :: skmatter.feature_selection.PCovFPS
175
+ :members:
176
+ :undoc-members:
177
+ :inherited-members:
238
178
239
- .. currentmodule :: skmatter._selection
240
-
241
- .. automethod :: _PCovFPS.score
242
- .. automethod :: _PCovFPS.get_distance
243
- .. automethod :: _PCovFPS.get_select_distance
244
-
245
-
246
- and can
247
- be instantiated using
248
- :py:class: `skmatter.feature_selection.PCovFPS ` and :py:class: `skmatter.sample_selection.PCovFPS `.
249
-
250
- .. code-block :: python
251
-
252
- from skmatter.feature_selection import PCovFPS
253
-
254
- selector = PCovFPS(
255
- n_to_select = 4 ,
256
- progress_bar = True ,
257
- score_threshold = 1e-12 ,
258
- full = False ,
259
- # float, default=0.5
260
- # The PCovR mixing parameter, as described in PCovR as alpha
261
- mixing = 0.5 ,
262
- # int or 'random', default=0
263
- # Index of the first selection.
264
- # If ‘random’, picks a random value when fit starts.
265
- initialize = 0 ,
266
- )
267
- selector.fit(X, y)
268
-
269
- Xr = selector.transform(X)
179
+ .. autoclass :: skmatter.sample_selection.PCovFPS
180
+ :members:
181
+ :undoc-members:
182
+ :inherited-members:
270
183
271
184
.. _Voronoi-FPS-api :
272
185
273
186
Voronoi FPS
274
- ###########
275
-
276
- .. currentmodule :: skmatter.sample_selection._voronoi_fps
277
-
278
- .. autoclass :: VoronoiFPS
279
-
280
- These selectors can be instantiated using
281
- :py:class: `skmatter.sample_selection.VoronoiFPS `.
187
+ -----------
282
188
283
- .. code-block :: python
189
+ .. autoclass :: skmatter.sample_selection.VoronoiFPS
190
+ :members:
191
+ :undoc-members:
192
+ :inherited-members:
284
193
285
- from skmatter.feature_selection import VoronoiFPS
286
-
287
- selector = VoronoiFPS(
288
- n_to_select = 4 ,
289
- progress_bar = True ,
290
- score_threshold = 1e-12 ,
291
- full = False ,
292
- # n_trial_calculation used for calculation of full_fraction,
293
- # so you need to determine only one parameter
294
- n_trial_calculation = 4 ,
295
- full_fraction = None ,
296
- # int or 'random', default=0
297
- # Index of the first selection.
298
- # If ‘random’, picks a random value when fit starts.
299
- initialize = 0 ,
300
- )
301
- selector.fit(X)
302
-
303
- Xr = selector.transform(X)
304
194
305
195
When *Not * to Use Voronoi FPS
306
- -----------------------------
196
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
307
197
308
198
In many cases, this algorithm may not increase upon the efficiency. For example, for
309
199
simple metrics (such as Euclidean distance), Voronoi FPS will likely not accelerate, and
@@ -315,27 +205,9 @@ bookkeeping significantly degrades the speed of work compared to FPS.
315
205
.. _DCH-api :
316
206
317
207
Directional Convex Hull (DCH)
318
- #############################
319
- .. currentmodule :: skmatter.sample_selection._base
320
-
321
- .. autoclass :: DirectionalConvexHull
322
-
323
- This selector can be instantiated using
324
- :class: `skmatter.sample_selection.DirectionalConvexHull `.
325
-
326
- .. code-block :: python
327
-
328
- from skmatter.sample_selection import DirectionalConvexHull
329
-
330
- selector = DirectionalConvexHull(
331
- # Indices of columns of X to use for fitting
332
- # the convex hull
333
- low_dim_idx = [0 , 1 ],
334
- )
335
- selector.fit(X, y)
208
+ -----------------------------
336
209
337
- # Get the distance to the convex hull for samples used to fit the
338
- # convex hull. This can also be called using other samples (X_new)
339
- # and corresponding properties (y_new) that were not used to fit
340
- # the hull.
341
- Xr = selector.score_samples(X, y)
210
+ .. autoclass :: skmatter.sample_selection.DirectionalConvexHull
211
+ :members:
212
+ :undoc-members:
213
+ :inherited-members:
0 commit comments