Add documentation (#5)

OkonSamuel · ablaom · web-flow · commit 49061123ceb6 · 2024-05-19T21:12:48.000+01:00
* add draft for RFE model

* rename package

* Add FeatureSelectore and some tests

* fix current tests

* complete RFE model and add tests

* Update model docstring

* fix code, Update readme and add more tests

* Apply suggestions from code review

Co-authored-by: Anthony Blaom, PhD &lt;anthony.blaom@gmail.com&gt;

* rename n_features_to_select to n_features

* update readme with

* Apply suggestions from code review

Co-authored-by: Anthony Blaom, PhD &lt;anthony.blaom@gmail.com&gt;

* set max column limit to 92 in readme

* add Aqua.jl tests and refactor code

* update ci

* Apply suggestions from code review

Co-authored-by: Anthony Blaom, PhD &lt;anthony.blaom@gmail.com&gt;

* fix bug, add support for serialization and add more tests

* Update ci.yml

* Update ci.yml

* Update ci.yml

* Update ci.yml

* Update ci.yml

* add documentation

* Disable julia Nighly tests

---------

Co-authored-by: Anthony Blaom, PhD &lt;anthony.blaom@gmail.com&gt;
diff --git a/.github/workflows/ci_nightly.yml b/.github/workflows/ci_nightly.yml
diff --git a/Project.toml b/Project.toml
@@ -1,6 +1,6 @@
 name = "FeatureSelection"
 uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
-authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
+authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>", "Samuel Okon <okonsamuel50@gmail.com"]
 version = "0.1.0"
 
 [deps]
diff --git a/README.md b/README.md
@@ -4,101 +4,4 @@
 | :------------ | :------- | :------------- |
 | [![Build Status](https://github.com/JuliaAI/FeatureSelection.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/FeatureSelection.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/FeatureSelection.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/FeatureSelection.jl?branch=dev) | [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle) |
 
-Repository housing feature selection algorithms for use with the machine learning toolbox
-[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/).
-
-`FeatureSelector` model builds on contributions originally residing at [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/v0.16.15/src/builtins/Transformers.jl#L189-L266)
-
-# Installation
-On a running instance of Julia with at least version 1.6 run
-```julia
-import Pkg;
-Pkg.add("FeatureSelection")
-```
-
-# Example Usage
-Lets build a supervised recursive feature eliminator with `RandomForestRegressor` 
-from DecisionTree.jl as our base model.
-But first we need a dataset to train on. We shall create a synthetic dataset popularly 
-known in the R community as the friedman dataset#1. Notice how the target vector for this 
-dataset depends on only the first five columns of feature table. So we expect that our 
-recursive feature elimination should return the first columns as important features.
-```julia
-using MLJ, FeatureSelection
-using StableRNGs
-rng = StableRNG(10)
-A = rand(rng, 50, 10)
-X = MLJ.table(A) # features
-y = @views(
-    10 .* sin.(
-        pi .* A[:, 1] .* A[:, 2]
-    ) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
-) # target
-```
-Now we that we have our data we can create our recursive feature elimination model and 
-train it on our dataset
-```julia
-RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
-forest = RandomForestRegressor(rng=rng)
-rfe = RecursiveFeatureElimination(
-    model = forest, n_features=5, step=1
-) # see doctring for description of defaults
-mach = machine(rfe, X, y)
-fit!(mach)
-```
-We can inspect the feature importances in two ways:
-```julia
-# A variable with lower rank has more significance than a variable with higher rank.
-# A variable with Higher feature importance is better than a variable with lower 
-# feature importance
-report(mach).ranking # returns [1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
-feature_importances(mach) # returns dict of feature => importance pairs
-```
-We can view the important features used by our model by inspecting the `fitted_params` 
-object.
-```julia
-p = fitted_params(mach)
-p.features_left == [:x1, :x2, :x3, :x4, :x5]
-```
-We can also call the `predict` method on the fitted machine, to predict using a 
-random forest regressor trained using only the important features, or call the `transform` 
-method, to select just those features from some new table including all the original 
-features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.
-
-Okay, let's say that we didn't know that our synthetic dataset depends on only five 
-columns from our feature table. We could apply cross fold validation 
-`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the 
-optimal value of `n_features` for our model. In this case we will use a simple Grid 
-search with root mean square as the measure. 
-```julia
-rfe = RecursiveFeatureElimination(model = forest)
-tuning_rfe_model  = TunedModel(
-    model = rfe,
-    measure = rms,
-    tuning = Grid(rng=rng),
-    resampling = StratifiedCV(nfolds = 5),
-    range = range(
-        rfe, :n_features, values = 1:10
-    )
-)
-self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
-fit!(self_tuning_rfe_mach)
-```
-As before we can inspect the important features by inspecting the object returned by 
-`fitted_params` or `feature_importances` as shown below.
-```julia
-fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left == [:x1, :x2, :x3, :x4, :x5]
-feature_importances(self_tuning_rfe_mach) # returns dict of feature => importance pairs
-```
-and call `predict` on the tuned model machine as shown below
-```julia
-Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
-predict(self_tuning_rfe_mach, Xnew)
-```
-In this case, prediction is done using the best recursive feature elimination model gotten 
-from the tuning process above.
-
-For resampling methods different from cross-validation, and for other
- `TunedModel` options, such as parallelization, see the 
- [Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
-[MLJ Documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/)
+Repository housing feature selection algorithms for use with the machine learning toolbox [MLJ](https://juliaai.github.io/MLJ.jl/dev/).
diff --git a/docs/.gitignore b/docs/.gitignore
@@ -0,0 +1,2 @@
+Manifest.toml
+build/
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -0,0 +1,11 @@
+[deps]
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
+FeatureSelection = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
+StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
+
+[compat]
+Documenter = "^1.4"
+MLJ = "^0.20"
+StableRNGs = "^1.0"
+julia = "^1.0"
diff --git a/docs/make.jl b/docs/make.jl
@@ -0,0 +1,34 @@
+using Documenter, FeatureSelection
+
+makedocs(;
+    authors = """
+        Anthony D. Blaom <anthony.blaom@gmail.com>, 
+        Sebastian Vollmer <s.vollmer.4@warwick.ac.uk>, 
+        Okon Samuel <okonsamuel50@gmail.com>
+        """,
+    format = Documenter.HTML(;
+        prettyurls= get(ENV, "CI", "false") == "true",
+        edit_link = "dev"
+    ),
+    modules = [FeatureSelection],
+    pages=[
+        "Home" => "index.md",
+        "API" => "api.md"
+    ],
+    doctest = false, # don't runt doctest as doctests are automatically run separately in ci.
+    repo = Remotes.GitHub("JuliaAI", "FeatureSelection.jl"),
+    sitename = "FeatureSelection.jl",
+)
+
+# By default Documenter does not deploy docs just for PR
+# this causes issues with how we're doing things and ends
+# up choking the deployment of the docs, so  here we
+# force the environment to ignore this so that Documenter
+# does indeed deploy the docs
+#ENV["GITHUB_EVENT_NAME"] = "pull_request"
+
+deploydocs(;
+    deploy_config = Documenter.GitHubActions(),
+    repo="github.com/JuliaAI/FeatureSelection.jl.git",
+    push_preview=true
+)
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -0,0 +1,9 @@
+```@meta
+CurrentModule = FeatureSelection
+```
+# API
+# Models
+```@docs
+FeatureSelector
+RecursiveFeatureElimination
+```
diff --git a/docs/src/index.md b/docs/src/index.md
diff --git a/src/models/rfe.jl b/src/models/rfe.jl

-Original file line number
+Diff line change
@@ @@ -1,6 +1,6 @@ @@
 name = "FeatureSelection"
 uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
 -authors = ["Anthony D. Blaom <[email protected]>"]
 +authors = ["Anthony D. Blaom <[email protected]>", "Samuel Okon <[email protected]"]
 version = "0.1.0"
 [deps]