@@ -11,15 +11,12 @@ can be modified to combine supervised and unsupervised learning, in a formulatio
1111denoted `PCov-CUR ` and `PCov-FPS `.
1212For further reading, refer to [Imbalzano2018 ]_ and [Cersonsky2021 ]_.
1313
14-
1514These selectors can be used for both feature and sample selection, with similar
16- instantiations. Currently, all sub-selection methods extend :py:class: `GreedySelector `,
17- where at each iteration the model scores each
18- feature or sample (without an estimator) and chooses that with the maximum score.
19- This can be executed using:
15+ instantiations. This can be executed using:
2016
2117.. doctest ::
2218
19+ >>> # feature selection
2320 >>> import numpy as np
2421 >>> from skmatter.feature_selection import CUR , FPS , PCovCUR, PCovFPS
2522 >>> selector = CUR(
@@ -36,10 +33,14 @@ This can be executed using:
3633 ... # are exhausted
3734 ... full= False ,
3835 ... )
39- >>> X = np.array([[ 0.12 , 0.21 , 0.02 ], # 3 samples, 3 features
40- ... [- 0.09 , 0.32 , - 0.10 ],
41- ... [- 0.03 , - 0.53 , 0.08 ]])
42- >>> y = np.array([0 ., 0 ., 1 .]) # classes of each sample
36+ >>> X = np.array(
37+ ... [
38+ ... [0.12 , 0.21 , 0.02 ], # 3 samples, 3 features
39+ ... [- 0.09 , 0.32 , - 0.10 ],
40+ ... [- 0.03 , - 0.53 , 0.08 ],
41+ ... ]
42+ ... )
43+ >>> y = np.array([0.0 , 0.0 , 1.0 ]) # classes of each sample
4344 >>> selector.fit(X)
4445 CUR(n_to_select=2, progress_bar=True, score_threshold=1e-12)
4546 >>> Xr = selector.transform(X)
@@ -51,6 +52,8 @@ This can be executed using:
5152 >>> Xr = selector.transform(X)
5253 >>> print (Xr.shape)
5354 (3, 2)
55+ >>>
56+ >>> # Now sample selection
5457 >>> from skmatter.sample_selection import CUR , FPS , PCovCUR, PCovFPS
5558 >>> selector = CUR(n_to_select = 2 )
5659 >>> selector.fit(X)
@@ -59,23 +62,11 @@ This can be executed using:
5962 >>> print (Xr.shape)
6063 (2, 3)
6164
62- where `Selector ` is one of the classes below that overwrites the method
63- :py:func: `score `.
64-
65- From :py:class: `GreedySelector `, selectors inherit these public methods:
66-
67- .. currentmodule :: skmatter._selection
68-
69- .. class :: GreedySelector
70-
71- .. automethod :: fit
72- .. automethod :: transform
73- .. automethod :: get_support
7465
7566.. _CUR-api :
7667
7768CUR
78- ###
69+ ---
7970
8071
8172CUR decomposition begins by approximating a matrix :math: `{\mathbf {X}}` using a subset
@@ -100,88 +91,50 @@ features in a single iteration based upon the relative :math:`\pi` importance.
10091The feature and sample selection versions of CUR differ only in the computation of
10192:math: `\pi `. In sample selection :math: `\pi ` is computed using the left singular
10293vectors, versus in feature selection, :math: `\pi ` is computed using the right singular
103- vectors. In addition to :py:class: `GreedySelector `, both instances of CUR selection
104- build off of :py:class: `skmatter._selection._cur._CUR `, and inherit
105-
106- .. currentmodule :: skmatter._selection
107-
108- .. automethod :: _CUR.score
109- .. automethod :: _CUR._compute_pi
110-
111- They are instantiated using
112- :py:class: `skmatter.feature_selection.CUR ` and
113- :py:class: `skmatter.sample_selection.CUR `, e.g.
114-
115- .. code-block :: python
94+ vectors.
11695
117- from skmatter.feature_selection import CUR
96+ .. autoclass :: skmatter.feature_selection.CUR
97+ :members:
98+ :private-members: _compute_pi
99+ :undoc-members:
100+ :inherited-members:
118101
119- selector = CUR(
120- n_to_select = 4 ,
121- progress_bar = True ,
122- score_threshold = 1e-12 ,
123- full = False ,
124- # int, number of eigenvectors to use in computing pi
125- k = 1 ,
126- # int, number of steps after which to recompute pi
127- recompute_every = 1 ,
128- # float, threshold below which scores will be considered 0, defaults to 1E-12
129- tolerance = 1e-12 ,
130- )
131- selector.fit(X)
132-
133- Xr = selector.transform(X)
102+ .. autoclass :: skmatter.sample_selection.CUR
103+ :members:
104+ :private-members: _compute_pi
105+ :undoc-members:
106+ :inherited-members:
134107
135108.. _PCov-CUR-api :
136109
137110PCov-CUR
138- ########
111+ --------
139112
140113PCov-CUR extends upon CUR by using augmented right or left singular vectors inspired by
141114Principal Covariates Regression, as demonstrated in [Cersonsky2021 ]_. These methods
142115employ the modified kernel and covariance matrices introduced in :ref: `PCovR-api ` and
143116available via the Utility Classes.
144117
145118Again, the feature and sample selection versions of PCov-CUR differ only in the
146- computation of :math: `\pi `. So, in addition to :py:class: `GreedySelector `, both
147- instances of PCov-CUR selection build off of
148- :py:class: `skmatter._selection._cur._PCovCUR `, inheriting
149-
150- .. currentmodule :: skmatter._selection
151-
152- .. automethod :: _PCovCUR.score
153- .. automethod :: _PCovCUR._compute_pi
154-
155- and are instantiated using
156- :py:class: `skmatter.feature_selection.PCovCUR ` and :py:class: `skmatter.sample_selection.PCovCUR `.
157-
158- .. code-block :: python
119+ computation of :math: `\pi `. S
159120
160- from skmatter.feature_selection import PCovCUR
121+ .. autoclass :: skmatter.feature_selection.PCovCUR
122+ :members:
123+ :private-members: _compute_pi
124+ :undoc-members:
125+ :inherited-members:
161126
162- selector = PCovCUR(
163- n_to_select = 4 ,
164- progress_bar = True ,
165- score_threshold = 1e-12 ,
166- full = False ,
167- # float, default=0.5
168- # The PCovR mixing parameter, as described in PCovR as alpha
169- mixing = 0.5 ,
170- # int, number of eigenvectors to use in computing pi
171- k = 1 ,
172- # int, number of steps after which to recompute pi
173- recompute_every = 1 ,
174- # float, threshold below which scores will be considered 0, defaults to 1E-12
175- tolerance = 1e-12 ,
176- )
177- selector.fit(X, y)
127+ .. autoclass :: skmatter.sample_selection.PCovCUR
128+ :members:
129+ :private-members: _compute_pi
130+ :undoc-members:
131+ :inherited-members:
178132
179- Xr = selector.transform(X)
180133
181134.. _FPS-api :
182135
183136Farthest Point-Sampling (FPS)
184- #############################
137+ -----------------------------
185138
186139Farthest Point Sampling is a common selection technique intended to exploit the
187140diversity of the input space.
@@ -194,116 +147,53 @@ distance, however other distance metrics may be employed.
194147Similar to CUR, the feature and selection versions of FPS differ only in the way
195148distance is computed (feature selection does so column-wise, sample selection does so
196149row-wise), and are built off of the same base class,
197- :py:class: `skmatter._selection._fps._FPS `, in addition to GreedySelector, and inherit
198-
199- .. currentmodule :: skmatter._selection
200-
201- .. automethod :: _FPS.score
202- .. automethod :: _FPS.get_distance
203- .. automethod :: _FPS.get_select_distance
204150
205151These selectors can be instantiated using :py:class: `skmatter.feature_selection.FPS ` and
206152:py:class: `skmatter.sample_selection.FPS `.
207153
208- .. code-block :: python
209-
210- from skmatter.feature_selection import FPS
211154
212- selector = FPS(
213- n_to_select = 4 ,
214- progress_bar = True ,
215- score_threshold = 1e-12 ,
216- full = False ,
217- # int or 'random', default=0
218- # Index of the first selection.
219- # If ‘random’, picks a random value when fit starts.
220- initialize = 0 ,
221- )
222- selector.fit(X)
155+ .. autoclass :: skmatter.feature_selection.FPS
156+ :members:
157+ :undoc-members:
158+ :inherited-members:
223159
224- Xr = selector.transform(X)
160+ .. autoclass :: skmatter.sample_selection.FPS
161+ :members:
162+ :undoc-members:
163+ :inherited-members:
225164
226165.. _PCov-FPS-api :
227166
228167PCov-FPS
229- ########
168+ --------
230169
231170PCov-FPS extends upon FPS much like PCov-CUR does to CUR. Instead of using the Euclidean
232171distance solely in the space of :math: `\mathbf {X}`, we use a combined distance in terms
233172of :math: `\mathbf {X}` and :math: `\mathbf {y}`.
234173
235- Again, the feature and sample selection versions of PCov-FPS differ only in computing
236- the distances. So, in addition to :py:class: `GreedySelector `, both instances of PCov-FPS
237- selection build off of :py:class: `skmatter._selection._fps._PCovFPS `, and inherit
174+ .. autoclass :: skmatter.feature_selection.PCovFPS
175+ :members:
176+ :undoc-members:
177+ :inherited-members:
238178
239- .. currentmodule :: skmatter._selection
240-
241- .. automethod :: _PCovFPS.score
242- .. automethod :: _PCovFPS.get_distance
243- .. automethod :: _PCovFPS.get_select_distance
244-
245-
246- and can
247- be instantiated using
248- :py:class: `skmatter.feature_selection.PCovFPS ` and :py:class: `skmatter.sample_selection.PCovFPS `.
249-
250- .. code-block :: python
251-
252- from skmatter.feature_selection import PCovFPS
253-
254- selector = PCovFPS(
255- n_to_select = 4 ,
256- progress_bar = True ,
257- score_threshold = 1e-12 ,
258- full = False ,
259- # float, default=0.5
260- # The PCovR mixing parameter, as described in PCovR as alpha
261- mixing = 0.5 ,
262- # int or 'random', default=0
263- # Index of the first selection.
264- # If ‘random’, picks a random value when fit starts.
265- initialize = 0 ,
266- )
267- selector.fit(X, y)
268-
269- Xr = selector.transform(X)
179+ .. autoclass :: skmatter.sample_selection.PCovFPS
180+ :members:
181+ :undoc-members:
182+ :inherited-members:
270183
271184.. _Voronoi-FPS-api :
272185
273186Voronoi FPS
274- ###########
275-
276- .. currentmodule :: skmatter.sample_selection._voronoi_fps
277-
278- .. autoclass :: VoronoiFPS
279-
280- These selectors can be instantiated using
281- :py:class: `skmatter.sample_selection.VoronoiFPS `.
187+ -----------
282188
283- .. code-block :: python
189+ .. autoclass :: skmatter.sample_selection.VoronoiFPS
190+ :members:
191+ :undoc-members:
192+ :inherited-members:
284193
285- from skmatter.feature_selection import VoronoiFPS
286-
287- selector = VoronoiFPS(
288- n_to_select = 4 ,
289- progress_bar = True ,
290- score_threshold = 1e-12 ,
291- full = False ,
292- # n_trial_calculation used for calculation of full_fraction,
293- # so you need to determine only one parameter
294- n_trial_calculation = 4 ,
295- full_fraction = None ,
296- # int or 'random', default=0
297- # Index of the first selection.
298- # If ‘random’, picks a random value when fit starts.
299- initialize = 0 ,
300- )
301- selector.fit(X)
302-
303- Xr = selector.transform(X)
304194
305195When *Not * to Use Voronoi FPS
306- -----------------------------
196+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
307197
308198In many cases, this algorithm may not increase upon the efficiency. For example, for
309199simple metrics (such as Euclidean distance), Voronoi FPS will likely not accelerate, and
@@ -315,27 +205,9 @@ bookkeeping significantly degrades the speed of work compared to FPS.
315205.. _DCH-api :
316206
317207Directional Convex Hull (DCH)
318- #############################
319- .. currentmodule :: skmatter.sample_selection._base
320-
321- .. autoclass :: DirectionalConvexHull
322-
323- This selector can be instantiated using
324- :class: `skmatter.sample_selection.DirectionalConvexHull `.
325-
326- .. code-block :: python
327-
328- from skmatter.sample_selection import DirectionalConvexHull
329-
330- selector = DirectionalConvexHull(
331- # Indices of columns of X to use for fitting
332- # the convex hull
333- low_dim_idx = [0 , 1 ],
334- )
335- selector.fit(X, y)
208+ -----------------------------
336209
337- # Get the distance to the convex hull for samples used to fit the
338- # convex hull. This can also be called using other samples (X_new)
339- # and corresponding properties (y_new) that were not used to fit
340- # the hull.
341- Xr = selector.score_samples(X, y)
210+ .. autoclass :: skmatter.sample_selection.DirectionalConvexHull
211+ :members:
212+ :undoc-members:
213+ :inherited-members:
0 commit comments