Skip to content

Commit d1aba83

Browse files
committed
add documentation
1 parent 1720f81 commit d1aba83

File tree

8 files changed

+242
-97
lines changed

8 files changed

+242
-97
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "FeatureSelection"
22
uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
3-
authors = ["Anthony D. Blaom <[email protected]>"]
3+
authors = ["Anthony D. Blaom <[email protected]>", "Samuel Okon <[email protected]"]
44
version = "0.1.0"
55

66
[deps]

README.md

Lines changed: 1 addition & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -5,100 +5,6 @@
55
| [![Build Status](https://github.com/JuliaAI/FeatureSelection.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/FeatureSelection.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/FeatureSelection.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/FeatureSelection.jl?branch=dev) | [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle) |
66

77
Repository housing feature selection algorithms for use with the machine learning toolbox
8-
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/).
8+
[MLJ](https://juliaai.github.io/MLJ.jl/dev/).
99

1010
`FeatureSelector` model builds on contributions originally residing at [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/v0.16.15/src/builtins/Transformers.jl#L189-L266)
11-
12-
# Installation
13-
On a running instance of Julia with at least version 1.6 run
14-
```julia
15-
import Pkg;
16-
Pkg.add("FeatureSelection")
17-
```
18-
19-
# Example Usage
20-
Lets build a supervised recursive feature eliminator with `RandomForestRegressor`
21-
from DecisionTree.jl as our base model.
22-
But first we need a dataset to train on. We shall create a synthetic dataset popularly
23-
known in the R community as the friedman dataset#1. Notice how the target vector for this
24-
dataset depends on only the first five columns of feature table. So we expect that our
25-
recursive feature elimination should return the first columns as important features.
26-
```julia
27-
using MLJ, FeatureSelection
28-
using StableRNGs
29-
rng = StableRNG(10)
30-
A = rand(rng, 50, 10)
31-
X = MLJ.table(A) # features
32-
y = @views(
33-
10 .* sin.(
34-
pi .* A[:, 1] .* A[:, 2]
35-
) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
36-
) # target
37-
```
38-
Now we that we have our data we can create our recursive feature elimination model and
39-
train it on our dataset
40-
```julia
41-
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
42-
forest = RandomForestRegressor(rng=rng)
43-
rfe = RecursiveFeatureElimination(
44-
model = forest, n_features=5, step=1
45-
) # see doctring for description of defaults
46-
mach = machine(rfe, X, y)
47-
fit!(mach)
48-
```
49-
We can inspect the feature importances in two ways:
50-
```julia
51-
# A variable with lower rank has more significance than a variable with higher rank.
52-
# A variable with Higher feature importance is better than a variable with lower
53-
# feature importance
54-
report(mach).ranking # returns [1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
55-
feature_importances(mach) # returns dict of feature => importance pairs
56-
```
57-
We can view the important features used by our model by inspecting the `fitted_params`
58-
object.
59-
```julia
60-
p = fitted_params(mach)
61-
p.features_left == [:x1, :x2, :x3, :x4, :x5]
62-
```
63-
We can also call the `predict` method on the fitted machine, to predict using a
64-
random forest regressor trained using only the important features, or call the `transform`
65-
method, to select just those features from some new table including all the original
66-
features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.
67-
68-
Okay, let's say that we didn't know that our synthetic dataset depends on only five
69-
columns from our feature table. We could apply cross fold validation
70-
`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the
71-
optimal value of `n_features` for our model. In this case we will use a simple Grid
72-
search with root mean square as the measure.
73-
```julia
74-
rfe = RecursiveFeatureElimination(model = forest)
75-
tuning_rfe_model = TunedModel(
76-
model = rfe,
77-
measure = rms,
78-
tuning = Grid(rng=rng),
79-
resampling = StratifiedCV(nfolds = 5),
80-
range = range(
81-
rfe, :n_features, values = 1:10
82-
)
83-
)
84-
self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
85-
fit!(self_tuning_rfe_mach)
86-
```
87-
As before we can inspect the important features by inspecting the object returned by
88-
`fitted_params` or `feature_importances` as shown below.
89-
```julia
90-
fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left == [:x1, :x2, :x3, :x4, :x5]
91-
feature_importances(self_tuning_rfe_mach) # returns dict of feature => importance pairs
92-
```
93-
and call `predict` on the tuned model machine as shown below
94-
```julia
95-
Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
96-
predict(self_tuning_rfe_mach, Xnew)
97-
```
98-
In this case, prediction is done using the best recursive feature elimination model gotten
99-
from the tuning process above.
100-
101-
For resampling methods different from cross-validation, and for other
102-
`TunedModel` options, such as parallelization, see the
103-
[Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
104-
[MLJ Documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/)

docs/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Manifest.toml
2+
build/

docs/Project.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[deps]
2+
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
3+
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
4+
FeatureSelection = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
5+
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
6+
7+
[compat]
8+
Documenter = "^1.4"
9+
MLJ = "^0.20"
10+
StableRNGs = "^1.0"
11+
julia = "^1.0"

docs/make.jl

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
using Documenter, FeatureSelection
2+
3+
makedocs(;
4+
authors = """
5+
Anthony D. Blaom <[email protected]>,
6+
Sebastian Vollmer <[email protected]>,
7+
Okon Samuel <[email protected]>
8+
""",
9+
#format = Documenter.HTML(;
10+
# prettyurls= get(ENV, "CI", "false") == "true"
11+
#),
12+
#modules = [FeatureSelection],
13+
pages=[
14+
"Home" => "index.md",
15+
"API" => "api.md"
16+
],
17+
doctest = false, # don't runt doctest as doctests are automatically run separately in ci.
18+
repo = "https://github.com/JuliaAI/FeatureSelection/blob/{commit}{path}#L{line}",
19+
sitename = "FeatureSelection.jl",
20+
)
21+
22+
# By default Documenter does not deploy docs just for PR
23+
# this causes issues with how we're doing things and ends
24+
# up choking the deployment of the docs, so here we
25+
# force the environment to ignore this so that Documenter
26+
# does indeed deploy the docs
27+
#ENV["GITHUB_EVENT_NAME"] = "pull_request"
28+
29+
deploydocs(;
30+
deploy_config = Documenter.GitHubActions(),
31+
repo="github.com/JuliaAI/FeatureSelection.jl.git",
32+
push_preview=true
33+
)

docs/src/api.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
```@meta
2+
CurrentModule = FeatureSelection
3+
```
4+
# API
5+
# Models
6+
```@docs
7+
FeatureSelector
8+
RecursiveFeatureElimination
9+
```

docs/src/index.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
# FeatureSelection
2+
3+
FeatureSelction is a julia package containing implementations of feature selection algorithms for use with the machine learning toolbox
4+
[MLJ](https://juliaai.github.io/MLJ.jl/dev/).
5+
6+
# Installation
7+
On a running instance of Julia with at least version 1.6 run
8+
```julia
9+
import Pkg;
10+
Pkg.add("FeatureSelection")
11+
```
12+
13+
# Example Usage
14+
Lets build a supervised recursive feature eliminator with `RandomForestRegressor`
15+
from [DecisionTree.jl](https://github.com/JuliaAI/DecisionTree.jl) as our base model.
16+
But first we need a dataset to train on. We shall create a synthetic dataset popularly
17+
known in the R community as the friedman dataset#1. Notice how the target vector for this
18+
dataset depends on only the first five columns of feature table. So we expect that our
19+
recursive feature elimination should return the first columns as important features.
20+
```@meta
21+
DocTestSetup = quote
22+
using MLJ, FeatureSelection, StableRNGs
23+
rng = StableRNG(10)
24+
A = rand(rng, 50, 10)
25+
X = MLJ.table(A) # features
26+
y = @views(
27+
10 .* sin.(
28+
pi .* A[:, 1] .* A[:, 2]
29+
) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
30+
) # target
31+
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
32+
forest = RandomForestRegressor(rng=rng)
33+
rfe = RecursiveFeatureElimination(
34+
model = forest, n_features=5, step=1
35+
) # see doctring for description of defaults
36+
mach = machine(rfe, X, y)
37+
fit!(mach)
38+
39+
rfe = RecursiveFeatureElimination(model = forest)
40+
tuning_rfe_model = TunedModel(
41+
model = rfe,
42+
measure = rms,
43+
tuning = Grid(rng=rng),
44+
resampling = StratifiedCV(nfolds = 5),
45+
range = range(
46+
rfe, :n_features, values = 1:10
47+
)
48+
)
49+
self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
50+
fit!(self_tuning_rfe_mach)
51+
end
52+
```
53+
```@example example1
54+
using MLJ, FeatureSelection, StableRNGs
55+
rng = StableRNG(10)
56+
A = rand(rng, 50, 10)
57+
X = MLJ.table(A) # features
58+
y = @views(
59+
10 .* sin.(
60+
pi .* A[:, 1] .* A[:, 2]
61+
) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
62+
) # target
63+
```
64+
Now we that we have our data we can create our recursive feature elimination model and
65+
train it on our dataset
66+
```@example example1
67+
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
68+
forest = RandomForestRegressor(rng=rng)
69+
rfe = RecursiveFeatureElimination(
70+
model = forest, n_features=5, step=1
71+
) # see doctring for description of defaults
72+
mach = machine(rfe, X, y)
73+
fit!(mach)
74+
```
75+
We can inspect the feature importances in two ways:
76+
```jldoctest
77+
julia> report(mach).ranking
78+
10-element Vector{Int64}:
79+
1
80+
1
81+
1
82+
1
83+
1
84+
2
85+
3
86+
4
87+
5
88+
6
89+
90+
julia> feature_importances(mach)
91+
10-element Vector{Pair{Symbol, Int64}}:
92+
:x1 => 6
93+
:x2 => 5
94+
:x3 => 4
95+
:x4 => 3
96+
:x5 => 2
97+
:x6 => 1
98+
:x7 => 1
99+
:x8 => 1
100+
:x9 => 1
101+
:x10 => 1
102+
```
103+
Note that a variable with lower rank has more significance than a variable with higher rank while a variable with higher feature importance is better than a variable with lower feature importance.
104+
105+
We can view the important features used by our model by inspecting the `fitted_params`
106+
object.
107+
```jldoctest
108+
julia> p = fitted_params(mach)
109+
(features_left = [:x1, :x2, :x3, :x4, :x5],
110+
model_fitresult = (forest = Ensemble of Decision Trees
111+
Trees: 100
112+
Avg Leaves: 25.26
113+
Avg Depth: 8.36,),)
114+
115+
julia> p.features_left
116+
5-element Vector{Symbol}:
117+
:x1
118+
:x2
119+
:x3
120+
:x4
121+
:x5
122+
```
123+
We can also call the `predict` method on the fitted machine, to predict using a
124+
random forest regressor trained using only the important features, or call the `transform`
125+
method, to select just those features from some new table including all the original
126+
features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.
127+
128+
Okay, let's say that we didn't know that our synthetic dataset depends on only five
129+
columns from our feature table. We could apply cross fold validation
130+
`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the
131+
optimal value of `n_features` for our model. In this case we will use a simple Grid
132+
search with root mean square as the measure.
133+
```@example example1
134+
rfe = RecursiveFeatureElimination(model = forest)
135+
tuning_rfe_model = TunedModel(
136+
model = rfe,
137+
measure = rms,
138+
tuning = Grid(rng=rng),
139+
resampling = StratifiedCV(nfolds = 5),
140+
range = range(
141+
rfe, :n_features, values = 1:10
142+
)
143+
)
144+
self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
145+
fit!(self_tuning_rfe_mach)
146+
```
147+
As before we can inspect the important features by inspecting the object returned by
148+
`fitted_params` or `feature_importances` as shown below.
149+
```jldoctest
150+
julia> fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left
151+
5-element Vector{Symbol}:
152+
:x1
153+
:x2
154+
:x3
155+
:x4
156+
:x5
157+
158+
julia> feature_importances(self_tuning_rfe_mach)
159+
10-element Vector{Pair{Symbol, Int64}}:
160+
:x1 => 6
161+
:x2 => 5
162+
:x3 => 4
163+
:x4 => 3
164+
:x5 => 2
165+
:x6 => 1
166+
:x7 => 1
167+
:x8 => 1
168+
:x9 => 1
169+
:x10 => 1
170+
```
171+
and call `predict` on the tuned model machine as shown below
172+
```@example example1
173+
Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
174+
predict(self_tuning_rfe_mach, Xnew)
175+
```
176+
In this case, prediction is done using the best recursive feature elimination model gotten
177+
from the tuning process above.
178+
179+
For resampling methods different from cross-validation, and for other
180+
`TunedModel` options, such as parallelization, see the
181+
[Tuning Models](https://juliaai.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
182+
[MLJ Documentation](https://juliaai.github.io/MLJ.jl/dev/)
183+
```@meta
184+
DocTestSetup = nothing
185+
```

src/models/rfe.jl

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,6 @@ Xnew = MLJ.table(rand(rng, 50, 10));
139139
predict(mach, Xnew)
140140
141141
```
142-
143142
"""
144143
function RecursiveFeatureElimination(
145144
args...;

0 commit comments

Comments
 (0)