Skip to content

Commit 4906112

Browse files
OkonSamuelablaom
andauthored
Add documentation (#5)
* add draft for RFE model * rename package * Add FeatureSelectore and some tests * fix current tests * complete RFE model and add tests * Update model docstring * fix code, Update readme and add more tests * Apply suggestions from code review Co-authored-by: Anthony Blaom, PhD <[email protected]> * rename n_features_to_select to n_features * update readme with * Apply suggestions from code review Co-authored-by: Anthony Blaom, PhD <[email protected]> * set max column limit to 92 in readme * add Aqua.jl tests and refactor code * update ci * Apply suggestions from code review Co-authored-by: Anthony Blaom, PhD <[email protected]> * fix bug, add support for serialization and add more tests * Update ci.yml * Update ci.yml * Update ci.yml * Update ci.yml * Update ci.yml * add documentation * Disable julia Nighly tests --------- Co-authored-by: Anthony Blaom, PhD <[email protected]>
1 parent 12098d8 commit 4906112

File tree

9 files changed

+243
-150
lines changed

9 files changed

+243
-150
lines changed

.github/workflows/ci_nightly.yml

Lines changed: 0 additions & 50 deletions
This file was deleted.

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "FeatureSelection"
22
uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
3-
authors = ["Anthony D. Blaom <[email protected]>"]
3+
authors = ["Anthony D. Blaom <[email protected]>", "Samuel Okon <[email protected]"]
44
version = "0.1.0"
55

66
[deps]

README.md

Lines changed: 1 addition & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -4,101 +4,4 @@
44
| :------------ | :------- | :------------- |
55
| [![Build Status](https://github.com/JuliaAI/FeatureSelection.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/FeatureSelection.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/FeatureSelection.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/FeatureSelection.jl?branch=dev) | [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle) |
66

7-
Repository housing feature selection algorithms for use with the machine learning toolbox
8-
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/).
9-
10-
`FeatureSelector` model builds on contributions originally residing at [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/v0.16.15/src/builtins/Transformers.jl#L189-L266)
11-
12-
# Installation
13-
On a running instance of Julia with at least version 1.6 run
14-
```julia
15-
import Pkg;
16-
Pkg.add("FeatureSelection")
17-
```
18-
19-
# Example Usage
20-
Lets build a supervised recursive feature eliminator with `RandomForestRegressor`
21-
from DecisionTree.jl as our base model.
22-
But first we need a dataset to train on. We shall create a synthetic dataset popularly
23-
known in the R community as the friedman dataset#1. Notice how the target vector for this
24-
dataset depends on only the first five columns of feature table. So we expect that our
25-
recursive feature elimination should return the first columns as important features.
26-
```julia
27-
using MLJ, FeatureSelection
28-
using StableRNGs
29-
rng = StableRNG(10)
30-
A = rand(rng, 50, 10)
31-
X = MLJ.table(A) # features
32-
y = @views(
33-
10 .* sin.(
34-
pi .* A[:, 1] .* A[:, 2]
35-
) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
36-
) # target
37-
```
38-
Now we that we have our data we can create our recursive feature elimination model and
39-
train it on our dataset
40-
```julia
41-
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
42-
forest = RandomForestRegressor(rng=rng)
43-
rfe = RecursiveFeatureElimination(
44-
model = forest, n_features=5, step=1
45-
) # see doctring for description of defaults
46-
mach = machine(rfe, X, y)
47-
fit!(mach)
48-
```
49-
We can inspect the feature importances in two ways:
50-
```julia
51-
# A variable with lower rank has more significance than a variable with higher rank.
52-
# A variable with Higher feature importance is better than a variable with lower
53-
# feature importance
54-
report(mach).ranking # returns [1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
55-
feature_importances(mach) # returns dict of feature => importance pairs
56-
```
57-
We can view the important features used by our model by inspecting the `fitted_params`
58-
object.
59-
```julia
60-
p = fitted_params(mach)
61-
p.features_left == [:x1, :x2, :x3, :x4, :x5]
62-
```
63-
We can also call the `predict` method on the fitted machine, to predict using a
64-
random forest regressor trained using only the important features, or call the `transform`
65-
method, to select just those features from some new table including all the original
66-
features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.
67-
68-
Okay, let's say that we didn't know that our synthetic dataset depends on only five
69-
columns from our feature table. We could apply cross fold validation
70-
`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the
71-
optimal value of `n_features` for our model. In this case we will use a simple Grid
72-
search with root mean square as the measure.
73-
```julia
74-
rfe = RecursiveFeatureElimination(model = forest)
75-
tuning_rfe_model = TunedModel(
76-
model = rfe,
77-
measure = rms,
78-
tuning = Grid(rng=rng),
79-
resampling = StratifiedCV(nfolds = 5),
80-
range = range(
81-
rfe, :n_features, values = 1:10
82-
)
83-
)
84-
self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
85-
fit!(self_tuning_rfe_mach)
86-
```
87-
As before we can inspect the important features by inspecting the object returned by
88-
`fitted_params` or `feature_importances` as shown below.
89-
```julia
90-
fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left == [:x1, :x2, :x3, :x4, :x5]
91-
feature_importances(self_tuning_rfe_mach) # returns dict of feature => importance pairs
92-
```
93-
and call `predict` on the tuned model machine as shown below
94-
```julia
95-
Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
96-
predict(self_tuning_rfe_mach, Xnew)
97-
```
98-
In this case, prediction is done using the best recursive feature elimination model gotten
99-
from the tuning process above.
100-
101-
For resampling methods different from cross-validation, and for other
102-
`TunedModel` options, such as parallelization, see the
103-
[Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
104-
[MLJ Documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/)
7+
Repository housing feature selection algorithms for use with the machine learning toolbox [MLJ](https://juliaai.github.io/MLJ.jl/dev/).

docs/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Manifest.toml
2+
build/

docs/Project.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[deps]
2+
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
3+
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
4+
FeatureSelection = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
5+
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
6+
7+
[compat]
8+
Documenter = "^1.4"
9+
MLJ = "^0.20"
10+
StableRNGs = "^1.0"
11+
julia = "^1.0"

docs/make.jl

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
using Documenter, FeatureSelection
2+
3+
makedocs(;
4+
authors = """
5+
Anthony D. Blaom <[email protected]>,
6+
Sebastian Vollmer <[email protected]>,
7+
Okon Samuel <[email protected]>
8+
""",
9+
format = Documenter.HTML(;
10+
prettyurls= get(ENV, "CI", "false") == "true",
11+
edit_link = "dev"
12+
),
13+
modules = [FeatureSelection],
14+
pages=[
15+
"Home" => "index.md",
16+
"API" => "api.md"
17+
],
18+
doctest = false, # don't runt doctest as doctests are automatically run separately in ci.
19+
repo = Remotes.GitHub("JuliaAI", "FeatureSelection.jl"),
20+
sitename = "FeatureSelection.jl",
21+
)
22+
23+
# By default Documenter does not deploy docs just for PR
24+
# this causes issues with how we're doing things and ends
25+
# up choking the deployment of the docs, so here we
26+
# force the environment to ignore this so that Documenter
27+
# does indeed deploy the docs
28+
#ENV["GITHUB_EVENT_NAME"] = "pull_request"
29+
30+
deploydocs(;
31+
deploy_config = Documenter.GitHubActions(),
32+
repo="github.com/JuliaAI/FeatureSelection.jl.git",
33+
push_preview=true
34+
)

docs/src/api.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
```@meta
2+
CurrentModule = FeatureSelection
3+
```
4+
# API
5+
# Models
6+
```@docs
7+
FeatureSelector
8+
RecursiveFeatureElimination
9+
```

0 commit comments

Comments
 (0)