From 00faa50e6d369e91645c93e72616262b709e0c3f Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Tue, 12 Aug 2025 15:02:38 -0700 Subject: [PATCH 01/21] Initial commit --- learn/develop/filtro/index.html.md | 12 ++++++++++++ learn/develop/filtro/index.qmd | 12 ++++++++++++ learn/index-listing.json | 1 + 3 files changed, 25 insertions(+) create mode 100644 learn/develop/filtro/index.html.md create mode 100644 learn/develop/filtro/index.qmd diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md new file mode 100644 index 00000000..b11df9c0 --- /dev/null +++ b/learn/develop/filtro/index.html.md @@ -0,0 +1,12 @@ +--- +title: "Create your own score class object" +categories: + - developer tools +type: learn-subsection +weight: 1 +description: | + Create a new score class object for feature selection. +toc: true +toc-depth: 3 +include-after-body: ../../../resources.html +--- diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd new file mode 100644 index 00000000..b11df9c0 --- /dev/null +++ b/learn/develop/filtro/index.qmd @@ -0,0 +1,12 @@ +--- +title: "Create your own score class object" +categories: + - developer tools +type: learn-subsection +weight: 1 +description: | + Create a new score class object for feature selection. +toc: true +toc-depth: 3 +include-after-body: ../../../resources.html +--- diff --git a/learn/index-listing.json b/learn/index-listing.json index 6718339b..a04f6441 100644 --- a/learn/index-listing.json +++ b/learn/index-listing.json @@ -13,6 +13,7 @@ "/learn/statistics/tidy-analysis/index.html", "/learn/develop/broom/index.html", "/learn/develop/recipes/index.html", + "/learn/develop/filtro/index.html", "/learn/work/case-weights/index.html", "/learn/develop/metrics/index.html", "/learn/statistics/survival-metrics/index.html", From 8b92134817f0151765d749656beec607fdc818e2 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Mon, 18 Aug 2025 10:49:50 -0700 Subject: [PATCH 02/21] Half way to first draft --- .../filtro/index/execute-results/html.json | 15 +++ installs.R | 1 + learn/develop/filtro/index.html.md | 117 ++++++++++++++++++ learn/develop/filtro/index.qmd | 117 ++++++++++++++++++ 4 files changed, 250 insertions(+) create mode 100644 _freeze/learn/develop/filtro/index/execute-results/html.json diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json new file mode 100644 index 00000000..a0bce117 --- /dev/null +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -0,0 +1,15 @@ +{ + "hash": "df23ea2eb2945a788ec662eb6db21778", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is a parent class. There are a few properties to this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle?\n\n- `predictor_type`: What types of predictor can the method handle?\n\n- `case_weights`: Does the method accpet case weights? \n\n- `range`: Are there known ranges for the statistic?\n\n- `inclusive`: Are these ranges inclusive at the bounds?\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated?\n\n- `score_type`: What is the column name that will be used for the statistic values?\n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values?\n\n- `deterministic`: Does the fitting process use random numbers?\n\n- `tuning`: Does the method have tuning parameters?\n\n- `calculating_fn`: What function is used to estimate the values from data?\n\n- `label`: What label to use when printing?\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted.\n\n## Scoring object for specific to filter method\n\nThe `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n # Represent the score as -log10(p_value)?\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\n`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\n`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/installs.R b/installs.R index b62e0eec..f2522a2c 100644 --- a/installs.R +++ b/installs.R @@ -18,6 +18,7 @@ packages <- c( "doParallel", "dotwhisker", "embed", + "filtro", "forecast", "fs", "furrr", diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index b11df9c0..850380f6 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -10,3 +10,120 @@ toc: true toc-depth: 3 include-after-body: ../../../resources.html --- + +## Introduction + +To use code in this article, you will need to install the following packages: filtro. + +You can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. + +## Scoring object + +The `class_score` is a parent class. There are a few properties to this object: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +library(filtro) +args(class_score) +#> function (outcome_type = c("numeric", "factor"), predictor_type = c("numeric", +#> "factor"), case_weights = logical(0), range = integer(0), inclusive = logical(0), +#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, +#> direction = character(0), deterministic = logical(0), tuning = logical(0), +#> calculating_fn = function() NULL, label = character(0), packages = character(0), +#> results = data.frame()) +#> NULL +``` +::: + +- `outcome_type`: What types of outcome can the method handle? + +- `predictor_type`: What types of predictor can the method handle? + +- `case_weights`: Does the method accpet case weights? + +- `range`: Are there known ranges for the statistic? + +- `inclusive`: Are these ranges inclusive at the bounds? + +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? + +- `score_type`: What is the column name that will be used for the statistic values? + +- (Not used) `sorts`: How should the values be sorted (from most- to least-important)? + +- `direction`: What direction of values indicates the most important values? + +- `deterministic`: Does the fitting process use random numbers? + +- `tuning`: Does the method have tuning parameters? + +- `calculating_fn`: What function is used to estimate the values from data? + +- `label`: What label to use when printing? + +- `packages`: What packages, if any, are required to train the method? + +- `results`: A slot for the results once the method is fitted. + +## Scoring object for specific to filter method + +The `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +class_score_aov <- S7::new_class( + "class_score_aov", + parent = class_score, + properties = list( + # Represent the score as -log10(p_value)? + neg_log10 = S7::new_property(S7::class_logical, default = TRUE) + ) +) +``` +::: + +`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +score_aov_pval <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_pval", + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA p-values" + ) +``` +::: + +`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +score_aov_fstat <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_fstat", + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA F-statistics" + ) +``` +::: diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index b11df9c0..d55984c3 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -10,3 +10,120 @@ toc: true toc-depth: 3 include-after-body: ../../../resources.html --- + +```{r} +#| label: "setup" +#| include: false +#| message: false +#| warning: false +source(here::here("common.R")) +``` + +```{r} +#| label: "load" +#| include: false +#| message: false +#| warning: false +library(filtro) + +pkgs <- c("filtro") +``` + +## Introduction + +`r article_req_pkgs(pkgs)` + +You can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. + +## Scoring object + +The `class_score` is a parent class. There are a few properties to this object: + +```{r} +#| label: "class_score" +library(filtro) +args(class_score) +``` + +- `outcome_type`: What types of outcome can the method handle? + +- `predictor_type`: What types of predictor can the method handle? + +- `case_weights`: Does the method accpet case weights? + +- `range`: Are there known ranges for the statistic? + +- `inclusive`: Are these ranges inclusive at the bounds? + +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? + +- `score_type`: What is the column name that will be used for the statistic values? + +- (Not used) `sorts`: How should the values be sorted (from most- to least-important)? + +- `direction`: What direction of values indicates the most important values? + +- `deterministic`: Does the fitting process use random numbers? + +- `tuning`: Does the method have tuning parameters? + +- `calculating_fn`: What function is used to estimate the values from data? + +- `label`: What label to use when printing? + +- `packages`: What packages, if any, are required to train the method? + +- `results`: A slot for the results once the method is fitted. + +## Scoring object for specific to filter method + +The `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: + +```{r} +class_score_aov <- S7::new_class( + "class_score_aov", + parent = class_score, + properties = list( + # Represent the score as -log10(p_value)? + neg_log10 = S7::new_property(S7::class_logical, default = TRUE) + ) +) +``` + +`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: + +```{r} +score_aov_pval <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_pval", + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA p-values" + ) +``` + +`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: + +```{r} +score_aov_fstat <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_fstat", + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA F-statistics" + ) +``` \ No newline at end of file From 99420d3b1b67ad1a3eb919ca5a00659936d05854 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Mon, 18 Aug 2025 11:13:57 -0700 Subject: [PATCH 03/21] Hitting an error; I think it is related to PR 162 in filtro --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 41 ++++++++++- learn/develop/filtro/index.qmd | 71 ++++++++++++++++++- 3 files changed, 109 insertions(+), 7 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index a0bce117..8e7d1185 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "df23ea2eb2945a788ec662eb6db21778", + "hash": "14d3eb3143e20525b6e67bff52f5d14b", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is a parent class. There are a few properties to this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle?\n\n- `predictor_type`: What types of predictor can the method handle?\n\n- `case_weights`: Does the method accpet case weights? \n\n- `range`: Are there known ranges for the statistic?\n\n- `inclusive`: Are these ranges inclusive at the bounds?\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated?\n\n- `score_type`: What is the column name that will be used for the statistic values?\n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values?\n\n- `deterministic`: Does the fitting process use random numbers?\n\n- `tuning`: Does the method have tuning parameters?\n\n- `calculating_fn`: What function is used to estimate the values from data?\n\n- `label`: What label to use when printing?\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted.\n\n## Scoring object for specific to filter method\n\nThe `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n # Represent the score as -log10(p_value)?\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\n`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\n`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is a parent class. There are a few properties to this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle?\n\n- `predictor_type`: What types of predictor can the method handle?\n\n- `case_weights`: Does the method accpet case weights? \n\n- `range`: Are there known ranges for the statistic?\n\n- `inclusive`: Are these ranges inclusive at the bounds?\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated?\n\n- `score_type`: What is the column name that will be used for the statistic values?\n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values?\n\n- `deterministic`: Does the fitting process use random numbers?\n\n- `tuning`: Does the method have tuning parameters?\n\n- `calculating_fn`: What function is used to estimate the values from data?\n\n- `label`: What label to use when printing?\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted.\n\n## Scoring object specific to filter method\n\nThe `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n # Represent the score as -log10(p_value)?\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\n`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\nscore_aov_pval\n#> \n#> @ outcome_type : chr [1:2] \"numeric\" \"factor\"\n#> @ predictor_type: chr [1:2] \"numeric\" \"factor\"\n#> @ case_weights : logi TRUE\n#> @ range : num [1:2] 0 Inf\n#> @ inclusive : logi [1:2] FALSE FALSE\n#> @ fallback_value: num Inf\n#> @ score_type : chr \"aov_pval\"\n#> @ sorts : function () \n#> @ direction : chr \"maximize\"\n#> @ deterministic : logi TRUE\n#> @ tuning : logi FALSE\n#> @ calculating_fn: function () \n#> @ label : chr \"ANOVA p-values\"\n#> @ packages : chr(0) \n#> @ results :'data.frame':\t0 obs. of 0 variables\n#> @ neg_log10 : logi TRUE\n```\n:::\n\n\nThe properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@range\n#> [1] 0 Inf\nscore_aov_pval@fallback_value\n#> [1] Inf\n```\n:::\n\n\n`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, results can be accessed via `object@results`. For examples: \n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 850380f6..8d44462d 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -13,7 +13,7 @@ include-after-body: ../../../resources.html ## Introduction -To use code in this article, you will need to install the following packages: filtro. +To use code in this article, you will need to install the following packages: filtro and modeldata. You can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. @@ -66,7 +66,7 @@ args(class_score) - `results`: A slot for the results once the method is fitted. -## Scoring object for specific to filter method +## Scoring object specific to filter method The `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: @@ -103,6 +103,38 @@ score_aov_pval <- tuning = FALSE, label = "ANOVA p-values" ) +score_aov_pval +#> +#> @ outcome_type : chr [1:2] "numeric" "factor" +#> @ predictor_type: chr [1:2] "numeric" "factor" +#> @ case_weights : logi TRUE +#> @ range : num [1:2] 0 Inf +#> @ inclusive : logi [1:2] FALSE FALSE +#> @ fallback_value: num Inf +#> @ score_type : chr "aov_pval" +#> @ sorts : function () +#> @ direction : chr "maximize" +#> @ deterministic : logi TRUE +#> @ tuning : logi FALSE +#> @ calculating_fn: function () +#> @ label : chr "ANOVA p-values" +#> @ packages : chr(0) +#> @ results :'data.frame': 0 obs. of 0 variables +#> @ neg_log10 : logi TRUE +``` +::: + +The properties can be accessed via `object@`. For examples: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +score_aov_pval@case_weights +#> [1] TRUE +score_aov_pval@range +#> [1] 0 Inf +score_aov_pval@fallback_value +#> [1] Inf ``` ::: @@ -127,3 +159,8 @@ score_aov_fstat <- ) ``` ::: + +## Accessing Results After Fitting + +Once the method is fitted via `fit()`, results can be accessed via `object@results`. For examples: + diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index d55984c3..7305d9bb 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -25,8 +25,9 @@ source(here::here("common.R")) #| message: false #| warning: false library(filtro) +library(modeldata) -pkgs <- c("filtro") +pkgs <- c("filtro", "modeldata") ``` ## Introduction @@ -75,7 +76,7 @@ args(class_score) - `results`: A slot for the results once the method is fitted. -## Scoring object for specific to filter method +## Scoring object specific to filter method The `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: @@ -107,6 +108,15 @@ score_aov_pval <- tuning = FALSE, label = "ANOVA p-values" ) +score_aov_pval +``` + +The properties can be accessed via `object@`. For examples: + +```{r} +score_aov_pval@case_weights +score_aov_pval@range +score_aov_pval@fallback_value ``` `score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: @@ -126,4 +136,59 @@ score_aov_fstat <- tuning = FALSE, label = "ANOVA F-statistics" ) -``` \ No newline at end of file +``` + +## Accessing Results After Fitting + +Once the method is fitted via `fit()`, results can be accessed via `object@results`. For examples: + +```{r} +library(modeldata) +ames_subset <- modeldata::ames |> + # Use a subset of data for demonstration + dplyr::select( + Sale_Price, + MS_SubClass, + MS_Zoning, + Lot_Frontage, + Lot_Area, + Street + ) +ames_subset <- ames_subset |> + dplyr::mutate(Sale_Price = log10(Sale_Price)) +``` + +```{r} +library(modeldata) +ames_subset <- modeldata::ames |> + # Use a subset of data for demonstration + dplyr::select( + Sale_Price, + MS_SubClass, + MS_Zoning, + Lot_Frontage, + Lot_Area, + Street + ) +ames_subset <- ames_subset |> + dplyr::mutate(Sale_Price = log10(Sale_Price)) +``` + +```{r} +library(filtro) +# ANOVA p-value +ames_aov_pval_res <- + score_aov_pval |> + fit(Sale_Price ~ ., data = ames_subset) +ames_aov_pval_res@results +``` + +```{r} +library(filtro) +# ANOVA F-statistic +ames_aov_fstat_res <- + score_aov_fstat |> + fit(Sale_Price ~ ., data = ames_subset) +ames_aov_fstat_res@results +``` + From 93e27c73c96f975c7f07bd66f6af5389b448edc4 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Mon, 18 Aug 2025 14:51:50 -0700 Subject: [PATCH 04/21] End of first draft --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 112 ++++++++++++++---- learn/develop/filtro/index.qmd | 62 +++++----- 3 files changed, 117 insertions(+), 61 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 8e7d1185..e45d9e1e 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "14d3eb3143e20525b6e67bff52f5d14b", + "hash": "64a6365ccfe8f07c31e371a6587e5dfc", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is a parent class. There are a few properties to this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle?\n\n- `predictor_type`: What types of predictor can the method handle?\n\n- `case_weights`: Does the method accpet case weights? \n\n- `range`: Are there known ranges for the statistic?\n\n- `inclusive`: Are these ranges inclusive at the bounds?\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated?\n\n- `score_type`: What is the column name that will be used for the statistic values?\n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values?\n\n- `deterministic`: Does the fitting process use random numbers?\n\n- `tuning`: Does the method have tuning parameters?\n\n- `calculating_fn`: What function is used to estimate the values from data?\n\n- `label`: What label to use when printing?\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted.\n\n## Scoring object specific to filter method\n\nThe `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n # Represent the score as -log10(p_value)?\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\n`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\nscore_aov_pval\n#> \n#> @ outcome_type : chr [1:2] \"numeric\" \"factor\"\n#> @ predictor_type: chr [1:2] \"numeric\" \"factor\"\n#> @ case_weights : logi TRUE\n#> @ range : num [1:2] 0 Inf\n#> @ inclusive : logi [1:2] FALSE FALSE\n#> @ fallback_value: num Inf\n#> @ score_type : chr \"aov_pval\"\n#> @ sorts : function () \n#> @ direction : chr \"maximize\"\n#> @ deterministic : logi TRUE\n#> @ tuning : logi FALSE\n#> @ calculating_fn: function () \n#> @ label : chr \"ANOVA p-values\"\n#> @ packages : chr(0) \n#> @ results :'data.frame':\t0 obs. of 0 variables\n#> @ neg_log10 : logi TRUE\n```\n:::\n\n\nThe properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@range\n#> [1] 0 Inf\nscore_aov_pval@fallback_value\n#> [1] Inf\n```\n:::\n\n\n`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, results can be accessed via `object@results`. For examples: \n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties to this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle?\n\n- `predictor_type`: What types of predictor can the method handle?\n\n- `case_weights`: Does the method accpet case weights? \n\n- `range`: Are there known ranges for the statistic?\n\n- `inclusive`: Are these ranges inclusive at the bounds?\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated?\n\n- `score_type`: What is the column name that will be used for the statistic values?\n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values?\n\n- `deterministic`: Does the fitting process use random numbers?\n\n- `tuning`: Does the method have tuning parameters?\n\n- `calculating_fn`: What function is used to estimate the values from data?\n\n- `label`: What label to use when printing?\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to their implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n # Represent the score as -log10(p_value)?\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nFor this filter, users can use either p-value or the F-statistic. We will demonstrate how to create these instances (or objects).\n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@range\n#> [1] 0 Inf\nscore_aov_pval@fallback_value\n#> [1] Inf\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 8d44462d..d79560f8 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -19,7 +19,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -The `class_score` is a parent class. There are a few properties to this object: +The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties to this object: ::: {.cell layout-align="center"} @@ -66,9 +66,11 @@ args(class_score) - `results`: A slot for the results once the method is fitted. -## Scoring object specific to filter method +## Scoring object specific to the scoring method -The `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: +As an example, let’s consider the ANOVA F-test filter. + +`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to their implementation. For example: ::: {.cell layout-align="center"} @@ -84,11 +86,14 @@ class_score_aov <- S7::new_class( ``` ::: -`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: +For this filter, users can use either p-value or the F-statistic. We will demonstrate how to create these instances (or objects). + +`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: ::: {.cell layout-align="center"} ```{.r .cell-code} +# ANOVA p-value score_aov_pval <- class_score_aov( outcome_type = c("numeric", "factor"), @@ -103,28 +108,10 @@ score_aov_pval <- tuning = FALSE, label = "ANOVA p-values" ) -score_aov_pval -#> -#> @ outcome_type : chr [1:2] "numeric" "factor" -#> @ predictor_type: chr [1:2] "numeric" "factor" -#> @ case_weights : logi TRUE -#> @ range : num [1:2] 0 Inf -#> @ inclusive : logi [1:2] FALSE FALSE -#> @ fallback_value: num Inf -#> @ score_type : chr "aov_pval" -#> @ sorts : function () -#> @ direction : chr "maximize" -#> @ deterministic : logi TRUE -#> @ tuning : logi FALSE -#> @ calculating_fn: function () -#> @ label : chr "ANOVA p-values" -#> @ packages : chr(0) -#> @ results :'data.frame': 0 obs. of 0 variables -#> @ neg_log10 : logi TRUE ``` ::: -The properties can be accessed via `object@`. For examples: +Individual properties can be accessed via `object@`. For examples: ::: {.cell layout-align="center"} @@ -138,11 +125,12 @@ score_aov_pval@fallback_value ``` ::: -`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: +`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: ::: {.cell layout-align="center"} ```{.r .cell-code} +# ANOVA F-statistic score_aov_fstat <- class_score_aov( outcome_type = c("numeric", "factor"), @@ -162,5 +150,79 @@ score_aov_fstat <- ## Accessing Results After Fitting -Once the method is fitted via `fit()`, results can be accessed via `object@results`. For examples: +Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +library(modeldata) +ames_subset <- modeldata::ames |> + # Use a subset of data for demonstration + dplyr::select( + Sale_Price, + MS_SubClass, + MS_Zoning, + Lot_Frontage, + Lot_Area, + Street + ) +ames_subset <- ames_subset |> + dplyr::mutate(Sale_Price = log10(Sale_Price)) +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# # Specify ANOVA p-value and fit score +# ames_aov_pval_res <- +# score_aov_pval |> +# fit(Sale_Price ~ ., data = ames_subset) +# ames_aov_pval_res@results +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# # Specify ANOVA F-statistic and fit score +# ames_aov_fstat_res <- +# score_aov_fstat |> +# fit(Sale_Price ~ ., data = ames_subset) +# ames_aov_fstat_res@results +``` +::: + +## Session information {#session-info} + +::: {.cell layout-align="center"} + +``` +#> +#> Attaching package: 'dplyr' +#> The following objects are masked from 'package:stats': +#> +#> filter, lag +#> The following objects are masked from 'package:base': +#> +#> intersect, setdiff, setequal, union +#> ─ Session info ───────────────────────────────────────────────────── +#> version R version 4.5.0 (2025-04-11) +#> language (EN) +#> date 2025-08-18 +#> pandoc 3.6.3 +#> quarto 1.7.32 +#> +#> ─ Packages ───────────────────────────────────────────────────────── +#> package version date (UTC) source +#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0) +#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85) +#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0) +#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0) +#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0) +#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0) +#> +#> ──────────────────────────────────────────────────────────────────── +``` +::: diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 7305d9bb..5eb47b0e 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -38,7 +38,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -The `class_score` is a parent class. There are a few properties to this object: +The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties to this object: ```{r} #| label: "class_score" @@ -76,9 +76,11 @@ args(class_score) - `results`: A slot for the results once the method is fitted. -## Scoring object specific to filter method +## Scoring object specific to the scoring method -The `class_score_aov` is a subclass of `class_score`. This subclass allows additional properties to be introduced: +As an example, let’s consider the ANOVA F-test filter. + +`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to their implementation. For example: ```{r} class_score_aov <- S7::new_class( @@ -91,9 +93,12 @@ class_score_aov <- S7::new_class( ) ``` -`score_aov_pval` is an instance (i.e., object) of the `class_score_aov` subclass, created using its constructor function: +For this filter, users can use either p-value or the F-statistic. We will demonstrate how to create these instances (or objects). + +`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: ```{r} +# ANOVA p-value score_aov_pval <- class_score_aov( outcome_type = c("numeric", "factor"), @@ -108,10 +113,9 @@ score_aov_pval <- tuning = FALSE, label = "ANOVA p-values" ) -score_aov_pval ``` -The properties can be accessed via `object@`. For examples: +Individual properties can be accessed via `object@`. For examples: ```{r} score_aov_pval@case_weights @@ -119,9 +123,10 @@ score_aov_pval@range score_aov_pval@fallback_value ``` -`score_aov_fstat` is another instance (i.e., object) of the `class_score_aov` subclass: +`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: ```{r} +# ANOVA F-statistic score_aov_fstat <- class_score_aov( outcome_type = c("numeric", "factor"), @@ -140,7 +145,7 @@ score_aov_fstat <- ## Accessing Results After Fitting -Once the method is fitted via `fit()`, results can be accessed via `object@results`. For examples: +Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: ```{r} library(modeldata) @@ -159,36 +164,25 @@ ames_subset <- ames_subset |> ``` ```{r} -library(modeldata) -ames_subset <- modeldata::ames |> - # Use a subset of data for demonstration - dplyr::select( - Sale_Price, - MS_SubClass, - MS_Zoning, - Lot_Frontage, - Lot_Area, - Street - ) -ames_subset <- ames_subset |> - dplyr::mutate(Sale_Price = log10(Sale_Price)) +# # Specify ANOVA p-value and fit score +# ames_aov_pval_res <- +# score_aov_pval |> +# fit(Sale_Price ~ ., data = ames_subset) +# ames_aov_pval_res@results ``` ```{r} -library(filtro) -# ANOVA p-value -ames_aov_pval_res <- - score_aov_pval |> - fit(Sale_Price ~ ., data = ames_subset) -ames_aov_pval_res@results +# # Specify ANOVA F-statistic and fit score +# ames_aov_fstat_res <- +# score_aov_fstat |> +# fit(Sale_Price ~ ., data = ames_subset) +# ames_aov_fstat_res@results ``` +## Session information {#session-info} + ```{r} -library(filtro) -# ANOVA F-statistic -ames_aov_fstat_res <- - score_aov_fstat |> - fit(Sale_Price ~ ., data = ames_subset) -ames_aov_fstat_res@results +#| label: "si" +#| echo: false +small_session(pkgs) ``` - From 8c42bae203c9a45d6c08ad455d0eb667f2ba3ab7 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Mon, 18 Aug 2025 16:06:17 -0700 Subject: [PATCH 05/21] Checking with Max to see if we'd include discussion about fit() --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 40 ++++++++++--------- learn/develop/filtro/index.qmd | 38 +++++++++--------- 3 files changed, 43 insertions(+), 39 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index e45d9e1e..c2915163 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "64a6365ccfe8f07c31e371a6587e5dfc", + "hash": "d94e7d3f6479e62f73f6f270326b8da2", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties to this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle?\n\n- `predictor_type`: What types of predictor can the method handle?\n\n- `case_weights`: Does the method accpet case weights? \n\n- `range`: Are there known ranges for the statistic?\n\n- `inclusive`: Are these ranges inclusive at the bounds?\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated?\n\n- `score_type`: What is the column name that will be used for the statistic values?\n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values?\n\n- `deterministic`: Does the fitting process use random numbers?\n\n- `tuning`: Does the method have tuning parameters?\n\n- `calculating_fn`: What function is used to estimate the values from data?\n\n- `label`: What label to use when printing?\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to their implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n # Represent the score as -log10(p_value)?\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nFor this filter, users can use either p-value or the F-statistic. We will demonstrate how to create these instances (or objects).\n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@range\n#> [1] 0 Inf\nscore_aov_pval@fallback_value\n#> [1] Inf\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\n- `neg_log10`: Represent the score as `-log10(p_value)`?\n\nFor this filter, users can use either p-value or the F-statistic. We demonstrate how to create these instances (or objects) next.\n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index d79560f8..31c1de02 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -19,7 +19,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties to this object: +The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: ::: {.cell layout-align="center"} @@ -36,57 +36,59 @@ args(class_score) ``` ::: -- `outcome_type`: What types of outcome can the method handle? +- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. -- `predictor_type`: What types of predictor can the method handle? +- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. -- `case_weights`: Does the method accpet case weights? +- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`. -- `range`: Are there known ranges for the statistic? +- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. -- `inclusive`: Are these ranges inclusive at the bounds? +- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. -- `score_type`: What is the column name that will be used for the statistic values? +- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. - (Not used) `sorts`: How should the values be sorted (from most- to least-important)? -- `direction`: What direction of values indicates the most important values? +- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. -- `deterministic`: Does the fitting process use random numbers? +- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`. -- `tuning`: Does the method have tuning parameters? +- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`. -- `calculating_fn`: What function is used to estimate the values from data? +- `calculating_fn`: What function, if any, is used to estimate the values from data? -- `label`: What label to use when printing? +- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. - `packages`: What packages, if any, are required to train the method? -- `results`: A slot for the results once the method is fitted. +- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. ## Scoring object specific to the scoring method As an example, let’s consider the ANOVA F-test filter. -`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to their implementation. For example: +`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: ::: {.cell layout-align="center"} ```{.r .cell-code} +# Create a subclass named 'class_score_aov' class_score_aov <- S7::new_class( "class_score_aov", parent = class_score, properties = list( - # Represent the score as -log10(p_value)? neg_log10 = S7::new_property(S7::class_logical, default = TRUE) ) ) ``` ::: -For this filter, users can use either p-value or the F-statistic. We will demonstrate how to create these instances (or objects). +- `neg_log10`: Represent the score as `-log10(p_value)`? + +For this filter, users can use either p-value or the F-statistic. We demonstrate how to create these instances (or objects) next. `score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: @@ -118,10 +120,10 @@ Individual properties can be accessed via `object@`. For examples: ```{.r .cell-code} score_aov_pval@case_weights #> [1] TRUE -score_aov_pval@range -#> [1] 0 Inf score_aov_pval@fallback_value #> [1] Inf +score_aov_pval@direction +#> [1] "maximize" ``` ::: diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 5eb47b0e..4eb30122 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -38,7 +38,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties to this object: +The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: ```{r} #| label: "class_score" @@ -46,54 +46,56 @@ library(filtro) args(class_score) ``` -- `outcome_type`: What types of outcome can the method handle? +- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. -- `predictor_type`: What types of predictor can the method handle? +- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. -- `case_weights`: Does the method accpet case weights? +- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`. -- `range`: Are there known ranges for the statistic? +- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. -- `inclusive`: Are these ranges inclusive at the bounds? +- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. -- `score_type`: What is the column name that will be used for the statistic values? +- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. - (Not used) `sorts`: How should the values be sorted (from most- to least-important)? -- `direction`: What direction of values indicates the most important values? +- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. -- `deterministic`: Does the fitting process use random numbers? +- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`. -- `tuning`: Does the method have tuning parameters? +- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`. -- `calculating_fn`: What function is used to estimate the values from data? +- `calculating_fn`: What function, if any, is used to estimate the values from data? -- `label`: What label to use when printing? +- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. - `packages`: What packages, if any, are required to train the method? -- `results`: A slot for the results once the method is fitted. +- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. ## Scoring object specific to the scoring method As an example, let’s consider the ANOVA F-test filter. -`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to their implementation. For example: +`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: ```{r} +# Create a subclass named 'class_score_aov' class_score_aov <- S7::new_class( "class_score_aov", parent = class_score, properties = list( - # Represent the score as -log10(p_value)? neg_log10 = S7::new_property(S7::class_logical, default = TRUE) ) ) ``` -For this filter, users can use either p-value or the F-statistic. We will demonstrate how to create these instances (or objects). +- `neg_log10`: Represent the score as `-log10(p_value)`? + +For this filter, users can use either p-value or the F-statistic. We demonstrate how to create these instances (or objects) next. `score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: @@ -119,8 +121,8 @@ Individual properties can be accessed via `object@`. For examples: ```{r} score_aov_pval@case_weights -score_aov_pval@range score_aov_pval@fallback_value +score_aov_pval@direction ``` `score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: From e304645afb597db5adf7463a7ce6786c951a4e44 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Mon, 18 Aug 2025 16:25:30 -0700 Subject: [PATCH 06/21] Minor editing --- .../develop/filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 12 ++++++++++-- learn/develop/filtro/index.qmd | 12 ++++++++++-- 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index c2915163..0df62691 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "d94e7d3f6479e62f73f6f270326b8da2", + "hash": "68fd36c905f9f395b96e618ef3257d1e", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\n- `neg_log10`: Represent the score as `-log10(p_value)`?\n\nFor this filter, users can use either p-value or the F-statistic. We demonstrate how to create these instances (or objects) next.\n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`.\n\nFor this filter, users can use either \n\n- p-value or \n\n- F-statistic. \n\nNext, we demonstrate how to create these instances (or objects). \n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 31c1de02..a16d454e 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -86,9 +86,17 @@ class_score_aov <- S7::new_class( ``` ::: -- `neg_log10`: Represent the score as `-log10(p_value)`? +In addition to the properties inherited from the parent, `class_score_aov` also includes: -For this filter, users can use either p-value or the F-statistic. We demonstrate how to create these instances (or objects) next. +- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. + +For this filter, users can use either + +- p-value or + +- F-statistic. + +Next, we demonstrate how to create these instances (or objects). `score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 4eb30122..4e101609 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -93,9 +93,17 @@ class_score_aov <- S7::new_class( ) ``` -- `neg_log10`: Represent the score as `-log10(p_value)`? +In addition to the properties inherited from the parent, `class_score_aov` also includes: -For this filter, users can use either p-value or the F-statistic. We demonstrate how to create these instances (or objects) next. +- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. + +For this filter, users can use either + +- p-value or + +- F-statistic. + +Next, we demonstrate how to create these instances (or objects). `score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: From 5cc29e5764eef51b53b6d62dcf389e1e2ea8e65e Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Mon, 18 Aug 2025 16:43:11 -0700 Subject: [PATCH 07/21] Add a subsection to discuss fit(); WIP --- .../filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 18 ++++++++++++++++-- learn/develop/filtro/index.qmd | 16 ++++++++++++++-- 3 files changed, 32 insertions(+), 6 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 0df62691..11bfadc9 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "68fd36c905f9f395b96e618ef3257d1e", + "hash": "5954fe7979f7c219f4c3873a4732f10a", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nThe `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`.\n\nFor this filter, users can use either \n\n- p-value or \n\n- F-statistic. \n\nNext, we demonstrate how to create these instances (or objects). \n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\n`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`.\n\nFor this filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nNext, we demonstrate how to create these instances (or objects). \n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting or estimating score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nThe `fit()` function is a generic used to fit or estimate score.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index a16d454e..ff9dd57c 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -19,7 +19,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: +`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: ::: {.cell layout-align="center"} @@ -90,7 +90,7 @@ In addition to the properties inherited from the parent, `class_score_aov` also - `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. -For this filter, users can use either +For this filter, users can represent the score using either - p-value or @@ -158,6 +158,20 @@ score_aov_fstat <- ``` ::: +## Fitting or estimating score + +So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. + +The `fit()` function is a generic used to fit or estimate score. + +::: {.cell layout-align="center"} + +```{.r .cell-code} +score_aov_pval |> + fit(Sale_Price ~ ., data = ames) +``` +::: + ## Accessing Results After Fitting Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 4e101609..169e4de8 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -38,7 +38,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -The `class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: +`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: ```{r} #| label: "class_score" @@ -97,7 +97,7 @@ In addition to the properties inherited from the parent, `class_score_aov` also - `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. -For this filter, users can use either +For this filter, users can represent the score using either - p-value or @@ -153,6 +153,18 @@ score_aov_fstat <- ) ``` +## Fitting or estimating score + +So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. + +The `fit()` function is a generic used to fit or estimate score. + +```{r} +#| eval: false +score_aov_pval |> + fit(Sale_Price ~ ., data = ames) +``` + ## Accessing Results After Fitting Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: From dc992f226a39d5e559347d66986222110a0426f8 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Mon, 18 Aug 2025 16:51:22 -0700 Subject: [PATCH 08/21] WIP --- .../develop/filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 12 ++++++------ learn/develop/filtro/index.qmd | 12 ++++++------ 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 11bfadc9..7fb1c8b6 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "5954fe7979f7c219f4c3873a4732f10a", + "hash": "bb97a17ab13845ee0399349bd021a39d", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\n`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`.\n\nFor this filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nNext, we demonstrate how to create these instances (or objects). \n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting or estimating score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nThe `fit()` function is a generic used to fit or estimate score.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Accessing Results After Fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\n`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`.\n\nFor this filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (or objects) accordingly. \n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nThe `fit()` function is a generic used to fit (or estimate) score.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index ff9dd57c..0beec564 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -70,7 +70,7 @@ args(class_score) As an example, let’s consider the ANOVA F-test filter. -`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: +`class_score_aov` is a subclass of `class_score`. The subclass can also include additional properties specific to its implementation. For example: ::: {.cell layout-align="center"} @@ -86,7 +86,7 @@ class_score_aov <- S7::new_class( ``` ::: -In addition to the properties inherited from the parent, `class_score_aov` also includes: +In addition to the properties inherited from the parent class, `class_score_aov` also includes: - `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. @@ -96,7 +96,7 @@ For this filter, users can represent the score using either - F-statistic. -Next, we demonstrate how to create these instances (or objects). +We demonstrate how to create these instances (or objects) accordingly. `score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: @@ -158,11 +158,11 @@ score_aov_fstat <- ``` ::: -## Fitting or estimating score +## Fitting (or estimating) score So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -The `fit()` function is a generic used to fit or estimate score. +The `fit()` function is a generic used to fit (or estimate) score. ::: {.cell layout-align="center"} @@ -172,7 +172,7 @@ score_aov_pval |> ``` ::: -## Accessing Results After Fitting +## Accessing results after fitting Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 169e4de8..83d0910b 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -80,7 +80,7 @@ args(class_score) As an example, let’s consider the ANOVA F-test filter. -`class_score_aov` is a subclass of `class_score`. Because it inherits from the `class_score` parent class, all of the parent's properties are also inherited. The subclass can also include additional properties specific to its implementation. For example: +`class_score_aov` is a subclass of `class_score`. The subclass can also include additional properties specific to its implementation. For example: ```{r} # Create a subclass named 'class_score_aov' @@ -93,7 +93,7 @@ class_score_aov <- S7::new_class( ) ``` -In addition to the properties inherited from the parent, `class_score_aov` also includes: +In addition to the properties inherited from the parent class, `class_score_aov` also includes: - `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. @@ -103,7 +103,7 @@ For this filter, users can represent the score using either - F-statistic. -Next, we demonstrate how to create these instances (or objects). +We demonstrate how to create these instances (or objects) accordingly. `score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: @@ -153,11 +153,11 @@ score_aov_fstat <- ) ``` -## Fitting or estimating score +## Fitting (or estimating) score So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -The `fit()` function is a generic used to fit or estimate score. +The `fit()` function is a generic used to fit (or estimate) score. ```{r} #| eval: false @@ -165,7 +165,7 @@ score_aov_pval |> fit(Sale_Price ~ ., data = ames) ``` -## Accessing Results After Fitting +## Accessing results after fitting Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: From f875e38ad46b37f0b335c9d50d27c3782946a34c Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Tue, 19 Aug 2025 09:09:57 -0700 Subject: [PATCH 09/21] Finalize a subsection to discuss fit() --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 82 +++++++++++++++---- learn/develop/filtro/index.qmd | 68 +++++++++++---- 3 files changed, 117 insertions(+), 37 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 7fb1c8b6..be5f9b7d 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "bb97a17ab13845ee0399349bd021a39d", + "hash": "c30015f96bc239b4644a9f9974066870", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\n`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. The subclass can also include additional properties specific to its implementation. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`.\n\nFor this filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (or objects) accordingly. \n\n`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nThe `fit()` function is a generic used to fit (or estimate) score.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-18\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For examples, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For examples, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For examples, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For examples, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For examples, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` is a generic but also a method used to fit (or estimate) score.\n\nWhen `fit()` is a generic, it dispatches to the appropriate method based on the class of the object that is passed. When `fit()` is a method, it performs he actual fitting or score estimation for that specific class of object. \n\nThe ANOVA F-test filters, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filters, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 generics and methods \n\n\n\n## Accessing results after fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 0beec564..3edd9535 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -19,7 +19,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: +All subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object: ::: {.cell layout-align="center"} @@ -40,27 +40,27 @@ args(class_score) - `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. -- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`. +- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. -- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. +- `range`: Are there known ranges for the statistic? For examples, `c(0, Inf)`, `c(0, 1)`. -- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. +- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For examples, `0`, `Inf`. -- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. +- `score_type`: What is the column name that will be used for the statistic values? For examples, `aov_pval`, `aov_fstat`. - (Not used) `sorts`: How should the values be sorted (from most- to least-important)? -- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. +- `direction`: What direction of values indicates the most important values? For examples, `maximum`, `minimize`. -- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`. +- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. -- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`. +- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`. - `calculating_fn`: What function, if any, is used to estimate the values from data? -- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. +- `label`: What label to use when printing? For examples, `ANOVA p-values`, `ANOVA F-statistics`. - `packages`: What packages, if any, are required to train the method? @@ -70,7 +70,7 @@ args(class_score) As an example, let’s consider the ANOVA F-test filter. -`class_score_aov` is a subclass of `class_score`. The subclass can also include additional properties specific to its implementation. For example: +`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: ::: {.cell layout-align="center"} @@ -88,17 +88,17 @@ class_score_aov <- S7::new_class( In addition to the properties inherited from the parent class, `class_score_aov` also includes: -- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. +- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. -For this filter, users can represent the score using either +For the ANOVA F-test filter, users can represent the score using either - p-value or - F-statistic. -We demonstrate how to create these instances (or objects) accordingly. +We demonstrate how to create these instances (objects) accordingly. -`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: +We create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties: ::: {.cell layout-align="center"} @@ -135,7 +135,7 @@ score_aov_pval@direction ``` ::: -`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: +`score_aov_fstat` is another instance of the `class_score_aov` subclass: ::: {.cell layout-align="center"} @@ -162,16 +162,62 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -The `fit()` function is a generic used to fit (or estimate) score. +`fit()` is a generic but also a method used to fit (or estimate) score. + +When `fit()` is a generic, it dispatches to the appropriate method based on the class of the object that is passed. When `fit()` is a method, it performs he actual fitting or score estimation for that specific class of object. + +The ANOVA F-test filters, for example: ::: {.cell layout-align="center"} ```{.r .cell-code} +# Check the class of the object +class(score_aov_pval) +#> [1] "class_score_aov" "filtro::class_score" "S7_object" +class(score_aov_fstat) +#> [1] "class_score_aov" "filtro::class_score" "S7_object" +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Method dispatch for objects of class `class_score_aov` score_aov_pval |> fit(Sale_Price ~ ., data = ames) +score_aov_fstat |> + fit(Sale_Price ~ ., data = ames) ``` ::: +The correlation filters, for another example: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Check the class of the object +class(score_cor_pearson) +#> [1] "filtro::class_score_cor" "filtro::class_score" +#> [3] "S7_object" +class(score_cor_spearman) +#> [1] "filtro::class_score_cor" "filtro::class_score" +#> [3] "S7_object" +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Method dispatch for objects of class `class_score_aov` +score_cor_pearson |> + fit(Sale_Price ~ ., data = ames) +score_cor_spearman |> + fit(Sale_Price ~ ., data = ames) +``` +::: + +## Documenting S7 generics and methods + ## Accessing results after fitting Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: @@ -233,7 +279,7 @@ ames_subset <- ames_subset |> #> ─ Session info ───────────────────────────────────────────────────── #> version R version 4.5.0 (2025-04-11) #> language (EN) -#> date 2025-08-18 +#> date 2025-08-19 #> pandoc 3.6.3 #> quarto 1.7.32 #> diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 83d0910b..94e554b0 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -38,7 +38,7 @@ You can construct new scoring objects using `class_score()`. This article is a g ## Scoring object -`class_score` is the parent class of all subclasses related to the scoring method. There are a few properties for this object: +All subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object: ```{r} #| label: "class_score" @@ -50,27 +50,27 @@ args(class_score) - `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. -- `case_weights`: Does the method accpet case weights? `TRUE` or `FALSE`. +- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. -- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`. +- `range`: Are there known ranges for the statistic? For examples, `c(0, Inf)`, `c(0, 1)`. -- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`. +- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For examples, `0`, `Inf`. -- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. +- `score_type`: What is the column name that will be used for the statistic values? For examples, `aov_pval`, `aov_fstat`. - (Not used) `sorts`: How should the values be sorted (from most- to least-important)? -- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. +- `direction`: What direction of values indicates the most important values? For examples, `maximum`, `minimize`. -- `deterministic`: Does the fitting process use random numbers? `TRUE` or `FALSE`. +- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. -- `tuning`: Does the method have tuning parameters? `TRUE` or `FALSE`. +- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`. - `calculating_fn`: What function, if any, is used to estimate the values from data? -- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. +- `label`: What label to use when printing? For examples, `ANOVA p-values`, `ANOVA F-statistics`. - `packages`: What packages, if any, are required to train the method? @@ -80,7 +80,7 @@ args(class_score) As an example, let’s consider the ANOVA F-test filter. -`class_score_aov` is a subclass of `class_score`. The subclass can also include additional properties specific to its implementation. For example: +`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: ```{r} # Create a subclass named 'class_score_aov' @@ -95,17 +95,17 @@ class_score_aov <- S7::new_class( In addition to the properties inherited from the parent class, `class_score_aov` also includes: -- `neg_log10`: Represent the score as `-log10(p_value)`? `TRUE` or `FALSE`. +- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. -For this filter, users can represent the score using either +For the ANOVA F-test filter, users can represent the score using either - p-value or - F-statistic. -We demonstrate how to create these instances (or objects) accordingly. +We demonstrate how to create these instances (objects) accordingly. -`score_aov_pval` is an instance (or object) of the `class_score_aov` subclass, created using its constructor function: +We create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties: ```{r} # ANOVA p-value @@ -133,7 +133,7 @@ score_aov_pval@fallback_value score_aov_pval@direction ``` -`score_aov_fstat` is another instance (or object) of the `class_score_aov` subclass: +`score_aov_fstat` is another instance of the `class_score_aov` subclass: ```{r} # ANOVA F-statistic @@ -157,14 +157,48 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -The `fit()` function is a generic used to fit (or estimate) score. +`fit()` is a generic but also a method used to fit (or estimate) score. + +When `fit()` is a generic, it dispatches to the appropriate method based on the class of the object that is passed. When `fit()` is a method, it performs he actual fitting or score estimation for that specific class of object. + +The ANOVA F-test filters, for example: + +```{r} +# Check the class of the object +class(score_aov_pval) +class(score_aov_fstat) +``` ```{r} #| eval: false +# Method dispatch for objects of class `class_score_aov` score_aov_pval |> fit(Sale_Price ~ ., data = ames) +score_aov_fstat |> + fit(Sale_Price ~ ., data = ames) +``` + +The correlation filters, for another example: + +```{r} +# Check the class of the object +class(score_cor_pearson) +class(score_cor_spearman) ``` +```{r} +#| eval: false +# Method dispatch for objects of class `class_score_aov` +score_cor_pearson |> + fit(Sale_Price ~ ., data = ames) +score_cor_spearman |> + fit(Sale_Price ~ ., data = ames) +``` + +## Documenting S7 generics and methods + + + ## Accessing results after fitting Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: From f405fbcf6136ef13378ae6ebc30545a8e32d8377 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Tue, 19 Aug 2025 09:57:56 -0700 Subject: [PATCH 10/21] Finalize; Switch to filtro PR 162 --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 84 +++++++++++++++---- learn/develop/filtro/index.qmd | 70 ++++++++++++---- 3 files changed, 128 insertions(+), 30 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index be5f9b7d..9628805e 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "c30015f96bc239b4644a9f9974066870", + "hash": "2e72cbd78837166e7f8d3722f7ae8bf4", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For examples, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For examples, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For examples, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For examples, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For examples, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` is a generic but also a method used to fit (or estimate) score.\n\nWhen `fit()` is a generic, it dispatches to the appropriate method based on the class of the object that is passed. When `fit()` is a method, it performs he actual fitting or score estimation for that specific class of object. \n\nThe ANOVA F-test filters, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filters, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 generics and methods \n\n\n\n## Accessing results after fitting\n\nOnce the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as method(s) used to fit (or estimate) score.\n\nWe define a generic named `fit()` that dispatches to the appropriate method based on the class of the object that is passed. We also define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the (sub)class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the (sub)class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, but here’s how we approached it. \n\nInstead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help pages for specific `fit()` methods: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 3edd9535..06bcbabe 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -42,17 +42,17 @@ args(class_score) - `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. -- `range`: Are there known ranges for the statistic? For examples, `c(0, Inf)`, `c(0, 1)`. +- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. - `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For examples, `0`, `Inf`. +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. -- `score_type`: What is the column name that will be used for the statistic values? For examples, `aov_pval`, `aov_fstat`. +- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. - (Not used) `sorts`: How should the values be sorted (from most- to least-important)? -- `direction`: What direction of values indicates the most important values? For examples, `maximum`, `minimize`. +- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. - `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. @@ -60,7 +60,7 @@ args(class_score) - `calculating_fn`: What function, if any, is used to estimate the values from data? -- `label`: What label to use when printing? For examples, `ANOVA p-values`, `ANOVA F-statistics`. +- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. - `packages`: What packages, if any, are required to train the method? @@ -121,7 +121,7 @@ score_aov_pval <- ``` ::: -Individual properties can be accessed via `object@`. For examples: +Individual properties can be accessed via `object@`. For example: ::: {.cell layout-align="center"} @@ -162,11 +162,11 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -`fit()` is a generic but also a method used to fit (or estimate) score. +`fit()` serves both as a generic and as method(s) used to fit (or estimate) score. -When `fit()` is a generic, it dispatches to the appropriate method based on the class of the object that is passed. When `fit()` is a method, it performs he actual fitting or score estimation for that specific class of object. +We define a generic named `fit()` that dispatches to the appropriate method based on the class of the object that is passed. We also define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. -The ANOVA F-test filters, for example: +The ANOVA F-test filter, for example: ::: {.cell layout-align="center"} @@ -179,6 +179,8 @@ class(score_aov_fstat) ``` ::: +Both objects belong to the (sub)class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: + ::: {.cell layout-align="center"} ```{.r .cell-code} @@ -190,7 +192,7 @@ score_aov_fstat |> ``` ::: -The correlation filters, for another example: +The correlation filter, for another example: ::: {.cell layout-align="center"} @@ -205,10 +207,12 @@ class(score_cor_spearman) ``` ::: +Both objects belong to the (sub)class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: + ::: {.cell layout-align="center"} ```{.r .cell-code} -# Method dispatch for objects of class `class_score_aov` +# Method dispatch for objects of class `class_score_cor` score_cor_pearson |> fit(Sale_Price ~ ., data = ames) score_cor_spearman |> @@ -216,11 +220,47 @@ score_cor_spearman |> ``` ::: -## Documenting S7 generics and methods +## Documenting S7 methods + +Documentation for S7 methods is still a work in progress, but here’s how we approached it. + +Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. + +The code below opens the help page for the `fit()` generic: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Help page for `fit()` generic +?fit +``` +::: + +The code below opens the help pages for specific `fit()` methods: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Help page for `fit()` method along with the documentation for the specific object +?score_aov_pval +?score_aov_fstat +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Help page for `fit()` method along with the documentation for the specific object +?score_cor_pearson +?score_cor_spearman +``` +::: ## Accessing results after fitting -Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: +Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. + +We use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. ::: {.cell layout-align="center"} @@ -241,6 +281,8 @@ ames_subset <- ames_subset |> ``` ::: +Next, we fit the score as we discuss before: + ::: {.cell layout-align="center"} ```{.r .cell-code} @@ -248,7 +290,6 @@ ames_subset <- ames_subset |> # ames_aov_pval_res <- # score_aov_pval |> # fit(Sale_Price ~ ., data = ames_subset) -# ames_aov_pval_res@results ``` ::: @@ -259,6 +300,21 @@ ames_subset <- ames_subset |> # ames_aov_fstat_res <- # score_aov_fstat |> # fit(Sale_Price ~ ., data = ames_subset) +``` +::: + +Recall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# ames_aov_pval_res@results +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} # ames_aov_fstat_res@results ``` ::: diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 94e554b0..abc4edc9 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -52,17 +52,17 @@ args(class_score) - `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. -- `range`: Are there known ranges for the statistic? For examples, `c(0, Inf)`, `c(0, 1)`. +- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. - `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For examples, `0`, `Inf`. +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. -- `score_type`: What is the column name that will be used for the statistic values? For examples, `aov_pval`, `aov_fstat`. +- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. - (Not used) `sorts`: How should the values be sorted (from most- to least-important)? -- `direction`: What direction of values indicates the most important values? For examples, `maximum`, `minimize`. +- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. - `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. @@ -70,7 +70,7 @@ args(class_score) - `calculating_fn`: What function, if any, is used to estimate the values from data? -- `label`: What label to use when printing? For examples, `ANOVA p-values`, `ANOVA F-statistics`. +- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. - `packages`: What packages, if any, are required to train the method? @@ -125,7 +125,7 @@ score_aov_pval <- ) ``` -Individual properties can be accessed via `object@`. For examples: +Individual properties can be accessed via `object@`. For example: ```{r} score_aov_pval@case_weights @@ -157,11 +157,11 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -`fit()` is a generic but also a method used to fit (or estimate) score. +`fit()` serves both as a generic and as method(s) used to fit (or estimate) score. -When `fit()` is a generic, it dispatches to the appropriate method based on the class of the object that is passed. When `fit()` is a method, it performs he actual fitting or score estimation for that specific class of object. +We define a generic named `fit()` that dispatches to the appropriate method based on the class of the object that is passed. We also define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. -The ANOVA F-test filters, for example: +The ANOVA F-test filter, for example: ```{r} # Check the class of the object @@ -169,6 +169,8 @@ class(score_aov_pval) class(score_aov_fstat) ``` +Both objects belong to the (sub)class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: + ```{r} #| eval: false # Method dispatch for objects of class `class_score_aov` @@ -178,7 +180,7 @@ score_aov_fstat |> fit(Sale_Price ~ ., data = ames) ``` -The correlation filters, for another example: +The correlation filter, for another example: ```{r} # Check the class of the object @@ -186,22 +188,52 @@ class(score_cor_pearson) class(score_cor_spearman) ``` +Both objects belong to the (sub)class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: + ```{r} #| eval: false -# Method dispatch for objects of class `class_score_aov` +# Method dispatch for objects of class `class_score_cor` score_cor_pearson |> fit(Sale_Price ~ ., data = ames) score_cor_spearman |> fit(Sale_Price ~ ., data = ames) ``` -## Documenting S7 generics and methods +## Documenting S7 methods + +Documentation for S7 methods is still a work in progress, but here’s how we approached it. + +Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. + +The code below opens the help page for the `fit()` generic: + +```{r} +#| eval: false +# Help page for `fit()` generic +?fit +``` + +The code below opens the help pages for specific `fit()` methods: +```{r} +#| eval: false +# Help page for `fit()` method along with the documentation for the specific object +?score_aov_pval +?score_aov_fstat +``` +```{r} +#| eval: false +# Help page for `fit()` method along with the documentation for the specific object +?score_cor_pearson +?score_cor_spearman +``` ## Accessing results after fitting -Once the method is fitted via `fit()`, the data frame of results can be accessed via `object@results`. For examples: +Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. + +We use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. ```{r} library(modeldata) @@ -219,12 +251,13 @@ ames_subset <- ames_subset |> dplyr::mutate(Sale_Price = log10(Sale_Price)) ``` +Next, we fit the score as we discuss before: + ```{r} # # Specify ANOVA p-value and fit score # ames_aov_pval_res <- # score_aov_pval |> # fit(Sale_Price ~ ., data = ames_subset) -# ames_aov_pval_res@results ``` ```{r} @@ -232,6 +265,15 @@ ames_subset <- ames_subset |> # ames_aov_fstat_res <- # score_aov_fstat |> # fit(Sale_Price ~ ., data = ames_subset) +``` + +Recall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`: + +```{r} +# ames_aov_pval_res@results +``` + +```{r} # ames_aov_fstat_res@results ``` From 5756248befa4cbd7f86dd29537f733dd2d1a94d4 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Tue, 19 Aug 2025 10:30:47 -0700 Subject: [PATCH 11/21] Editing --- .../develop/filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 14 +++++++------- learn/develop/filtro/index.qmd | 14 +++++++------- 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 9628805e..f376cadd 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "2e72cbd78837166e7f8d3722f7ae8bf4", + "hash": "71c861b990d864bfdd4e2b795ba6ffd2", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as method(s) used to fit (or estimate) score.\n\nWe define a generic named `fit()` that dispatches to the appropriate method based on the class of the object that is passed. We also define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the (sub)class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the (sub)class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, but here’s how we approached it. \n\nInstead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help pages for specific `fit()` methods: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help pages for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 06bcbabe..e49eb2a5 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -162,9 +162,9 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -`fit()` serves both as a generic and as method(s) used to fit (or estimate) score. +`fit()` serves both as a generic and as methods used to fit (or estimate) score. -We define a generic named `fit()` that dispatches to the appropriate method based on the class of the object that is passed. We also define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. +The generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. The ANOVA F-test filter, for example: @@ -179,7 +179,7 @@ class(score_aov_fstat) ``` ::: -Both objects belong to the (sub)class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: +Both objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: ::: {.cell layout-align="center"} @@ -207,7 +207,7 @@ class(score_cor_spearman) ``` ::: -Both objects belong to the (sub)class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: +Both objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: ::: {.cell layout-align="center"} @@ -222,9 +222,9 @@ score_cor_spearman |> ## Documenting S7 methods -Documentation for S7 methods is still a work in progress, but here’s how we approached it. +Documentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: -Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. +We re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. The code below opens the help page for the `fit()` generic: @@ -236,7 +236,7 @@ The code below opens the help page for the `fit()` generic: ``` ::: -The code below opens the help pages for specific `fit()` methods: +The code below opens the help pages for specific `fit()` method: ::: {.cell layout-align="center"} diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index abc4edc9..ea1cb21d 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -157,9 +157,9 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -`fit()` serves both as a generic and as method(s) used to fit (or estimate) score. +`fit()` serves both as a generic and as methods used to fit (or estimate) score. -We define a generic named `fit()` that dispatches to the appropriate method based on the class of the object that is passed. We also define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. +The generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. The ANOVA F-test filter, for example: @@ -169,7 +169,7 @@ class(score_aov_pval) class(score_aov_fstat) ``` -Both objects belong to the (sub)class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: +Both objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: ```{r} #| eval: false @@ -188,7 +188,7 @@ class(score_cor_pearson) class(score_cor_spearman) ``` -Both objects belong to the (sub)class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: +Both objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: ```{r} #| eval: false @@ -201,9 +201,9 @@ score_cor_spearman |> ## Documenting S7 methods -Documentation for S7 methods is still a work in progress, but here’s how we approached it. +Documentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: -Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. +We re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. The code below opens the help page for the `fit()` generic: @@ -213,7 +213,7 @@ The code below opens the help page for the `fit()` generic: ?fit ``` -The code below opens the help pages for specific `fit()` methods: +The code below opens the help pages for specific `fit()` method: ```{r} #| eval: false From 4c63d9ff888665b6b7fa0ded461afdb3b13d4661 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Tue, 19 Aug 2025 10:33:13 -0700 Subject: [PATCH 12/21] Still can't run fit() : / --- _freeze/learn/develop/filtro/index/execute-results/html.json | 2 +- learn/develop/filtro/index.html.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index f376cadd..ab4c41c6 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -2,7 +2,7 @@ "hash": "71c861b990d864bfdd4e2b795ba6ffd2", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help pages for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help pages for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index e49eb2a5..3e329ec1 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -342,7 +342,7 @@ Recall that individual properties of an object can be accessed using `object@`. #> ─ Packages ───────────────────────────────────────────────────────── #> package version date (UTC) source #> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0) -#> filtro 0.1.0.9000 2025-08-15 Github (tidymodels/filtro@81c7d85) +#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235) #> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0) #> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0) #> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0) From 8ff9ae64c922e6464bfcc8a826872dfa262475b5 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Tue, 19 Aug 2025 16:59:49 -0700 Subject: [PATCH 13/21] Remove sorts --- _freeze/learn/develop/filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 4 +--- learn/develop/filtro/index.qmd | 4 +--- 3 files changed, 4 insertions(+), 8 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index ab4c41c6..99e4b4be 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "71c861b990d864bfdd4e2b795ba6ffd2", + "hash": "e90758204d5f614e68eb310fa9bf7866", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- (Not used) `sorts`: How should the values be sorted (from most- to least-important)?\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help pages for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 3e329ec1..5a8e0794 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -50,8 +50,6 @@ args(class_score) - `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. -- (Not used) `sorts`: How should the values be sorted (from most- to least-important)? - - `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. - `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. @@ -236,7 +234,7 @@ The code below opens the help page for the `fit()` generic: ``` ::: -The code below opens the help pages for specific `fit()` method: +The code below opens the help page for specific `fit()` method: ::: {.cell layout-align="center"} diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index ea1cb21d..75ccdd6b 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -60,8 +60,6 @@ args(class_score) - `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. -- (Not used) `sorts`: How should the values be sorted (from most- to least-important)? - - `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. - `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. @@ -213,7 +211,7 @@ The code below opens the help page for the `fit()` generic: ?fit ``` -The code below opens the help pages for specific `fit()` method: +The code below opens the help page for specific `fit()` method: ```{r} #| eval: false From 98090849425e32a526f83394c3311e45d7044957 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Wed, 20 Aug 2025 09:36:37 -0700 Subject: [PATCH 14/21] Finalizing --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 52 +++++++++++++------ learn/develop/filtro/index.qmd | 31 ++++++----- 3 files changed, 56 insertions(+), 31 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 99e4b4be..019b6389 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "e90758204d5f614e68eb310fa9bf7866", + "hash": "030eb929ef7d88e3ab50f83817a88697", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\n`fit()` serves both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"class_score_aov\" \"filtro::class_score\" \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA p-value and fit score\n# ames_aov_pval_res <-\n# score_aov_pval |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# # Specify ANOVA F-statistic and fit score\n# ames_aov_fstat_res <-\n# score_aov_fstat |>\n# fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_pval_res@results\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ames_aov_fstat_res@results\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-19\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nWe demonstrate how to create a custom scoring object specific to a given feature selection method.\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-20\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 5a8e0794..3b8aef06 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -66,6 +66,8 @@ args(class_score) ## Scoring object specific to the scoring method +We demonstrate how to create a custom scoring object specific to a given feature selection method. + As an example, let’s consider the ANOVA F-test filter. `class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: @@ -160,7 +162,7 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -`fit()` serves both as a generic and as methods used to fit (or estimate) score. +We now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score. The generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. @@ -171,9 +173,11 @@ The ANOVA F-test filter, for example: ```{.r .cell-code} # Check the class of the object class(score_aov_pval) -#> [1] "class_score_aov" "filtro::class_score" "S7_object" +#> [1] "filtro::class_score_aov" "filtro::class_score" +#> [3] "S7_object" class(score_aov_fstat) -#> [1] "class_score_aov" "filtro::class_score" "S7_object" +#> [1] "filtro::class_score_aov" "filtro::class_score" +#> [3] "S7_object" ``` ::: @@ -220,7 +224,7 @@ score_cor_spearman |> ## Documenting S7 methods -Documentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: +Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: We re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. @@ -258,7 +262,7 @@ The code below opens the help page for specific `fit()` method: Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. -We use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. +We use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. ::: {.cell layout-align="center"} @@ -284,20 +288,20 @@ Next, we fit the score as we discuss before: ::: {.cell layout-align="center"} ```{.r .cell-code} -# # Specify ANOVA p-value and fit score -# ames_aov_pval_res <- -# score_aov_pval |> -# fit(Sale_Price ~ ., data = ames_subset) +# Specify ANOVA p-value and fit score +ames_aov_pval_res <- + score_aov_pval |> + fit(Sale_Price ~ ., data = ames_subset) ``` ::: ::: {.cell layout-align="center"} ```{.r .cell-code} -# # Specify ANOVA F-statistic and fit score -# ames_aov_fstat_res <- -# score_aov_fstat |> -# fit(Sale_Price ~ ., data = ames_subset) +# Specify ANOVA F-statistic and fit score +ames_aov_fstat_res <- + score_aov_fstat |> + fit(Sale_Price ~ ., data = ames_subset) ``` ::: @@ -306,14 +310,30 @@ Recall that individual properties of an object can be accessed using `object@`. ::: {.cell layout-align="center"} ```{.r .cell-code} -# ames_aov_pval_res@results +ames_aov_pval_res@results +#> # A tibble: 5 × 4 +#> name score outcome predictor +#> +#> 1 aov_pval 237. Sale_Price MS_SubClass +#> 2 aov_pval 130. Sale_Price MS_Zoning +#> 3 aov_pval NA Sale_Price Lot_Frontage +#> 4 aov_pval NA Sale_Price Lot_Area +#> 5 aov_pval 5.75 Sale_Price Street ``` ::: ::: {.cell layout-align="center"} ```{.r .cell-code} -# ames_aov_fstat_res@results +ames_aov_fstat_res@results +#> # A tibble: 5 × 4 +#> name score outcome predictor +#> +#> 1 aov_fstat 94.5 Sale_Price MS_SubClass +#> 2 aov_fstat 115. Sale_Price MS_Zoning +#> 3 aov_fstat NA Sale_Price Lot_Frontage +#> 4 aov_fstat NA Sale_Price Lot_Area +#> 5 aov_fstat 22.9 Sale_Price Street ``` ::: @@ -333,7 +353,7 @@ Recall that individual properties of an object can be accessed using `object@`. #> ─ Session info ───────────────────────────────────────────────────── #> version R version 4.5.0 (2025-04-11) #> language (EN) -#> date 2025-08-19 +#> date 2025-08-20 #> pandoc 3.6.3 #> quarto 1.7.32 #> diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 75ccdd6b..887b550f 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -76,11 +76,14 @@ args(class_score) ## Scoring object specific to the scoring method +We demonstrate how to create a custom scoring object specific to a given feature selection method. + As an example, let’s consider the ANOVA F-test filter. `class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: ```{r} +#| eval: false # Create a subclass named 'class_score_aov' class_score_aov <- S7::new_class( "class_score_aov", @@ -106,6 +109,7 @@ We demonstrate how to create these instances (objects) accordingly. We create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties: ```{r} +#| eval: false # ANOVA p-value score_aov_pval <- class_score_aov( @@ -134,6 +138,7 @@ score_aov_pval@direction `score_aov_fstat` is another instance of the `class_score_aov` subclass: ```{r} +#| eval: false # ANOVA F-statistic score_aov_fstat <- class_score_aov( @@ -155,7 +160,7 @@ score_aov_fstat <- So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. -`fit()` serves both as a generic and as methods used to fit (or estimate) score. +We now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score. The generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. @@ -199,7 +204,7 @@ score_cor_spearman |> ## Documenting S7 methods -Documentation for S7 methods is still a work in progress, and it seems no one currently knows the correct approach. Here’s how we tackle it: +Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: We re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. @@ -231,7 +236,7 @@ The code below opens the help page for specific `fit()` method: Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. -We use a subset of the Ames data set from the {modeldata} package for demonstration. The data set is used to predict housing sale price, and `Sale_Price` is the outcome and is numeric. +We use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. ```{r} library(modeldata) @@ -252,27 +257,27 @@ ames_subset <- ames_subset |> Next, we fit the score as we discuss before: ```{r} -# # Specify ANOVA p-value and fit score -# ames_aov_pval_res <- -# score_aov_pval |> -# fit(Sale_Price ~ ., data = ames_subset) +# Specify ANOVA p-value and fit score +ames_aov_pval_res <- + score_aov_pval |> + fit(Sale_Price ~ ., data = ames_subset) ``` ```{r} -# # Specify ANOVA F-statistic and fit score -# ames_aov_fstat_res <- -# score_aov_fstat |> -# fit(Sale_Price ~ ., data = ames_subset) +# Specify ANOVA F-statistic and fit score +ames_aov_fstat_res <- + score_aov_fstat |> + fit(Sale_Price ~ ., data = ames_subset) ``` Recall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`: ```{r} -# ames_aov_pval_res@results +ames_aov_pval_res@results ``` ```{r} -# ames_aov_fstat_res@results +ames_aov_fstat_res@results ``` ## Session information {#session-info} From 5b48d781503d8e967681b281511a970211b2303f Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Wed, 20 Aug 2025 09:51:27 -0700 Subject: [PATCH 15/21] Minor editing --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 359 ------------------ learn/develop/filtro/index.qmd | 4 +- 3 files changed, 4 insertions(+), 363 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 019b6389..16fd2bec 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "030eb929ef7d88e3ab50f83817a88697", + "hash": "220fde2de89ece42c0a6700dfa1a6754", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nWe demonstrate how to create a custom scoring object specific to a given feature selection method.\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\nWe create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-20\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nWe demonstrate how to create a custom scoring object specific to the given scoring method.\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-20\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 3b8aef06..e2dad99e 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -11,362 +11,3 @@ toc-depth: 3 include-after-body: ../../../resources.html --- -## Introduction - -To use code in this article, you will need to install the following packages: filtro and modeldata. - -You can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. - -## Scoring object - -All subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -library(filtro) -args(class_score) -#> function (outcome_type = c("numeric", "factor"), predictor_type = c("numeric", -#> "factor"), case_weights = logical(0), range = integer(0), inclusive = logical(0), -#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, -#> direction = character(0), deterministic = logical(0), tuning = logical(0), -#> calculating_fn = function() NULL, label = character(0), packages = character(0), -#> results = data.frame()) -#> NULL -``` -::: - -- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. - -- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. - -- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. - -- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. - -- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. - -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. - -- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. - -- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. - -- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. - -- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`. - -- `calculating_fn`: What function, if any, is used to estimate the values from data? - -- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. - -- `packages`: What packages, if any, are required to train the method? - -- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. - -## Scoring object specific to the scoring method - -We demonstrate how to create a custom scoring object specific to a given feature selection method. - -As an example, let’s consider the ANOVA F-test filter. - -`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Create a subclass named 'class_score_aov' -class_score_aov <- S7::new_class( - "class_score_aov", - parent = class_score, - properties = list( - neg_log10 = S7::new_property(S7::class_logical, default = TRUE) - ) -) -``` -::: - -In addition to the properties inherited from the parent class, `class_score_aov` also includes: - -- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. - -For the ANOVA F-test filter, users can represent the score using either - -- p-value or - -- F-statistic. - -We demonstrate how to create these instances (objects) accordingly. - -We create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# ANOVA p-value -score_aov_pval <- - class_score_aov( - outcome_type = c("numeric", "factor"), - predictor_type = c("numeric", "factor"), - case_weights = TRUE, - range = c(0, Inf), - inclusive = c(FALSE, FALSE), - fallback_value = Inf, - score_type = "aov_pval", - direction = "maximize", - deterministic = TRUE, - tuning = FALSE, - label = "ANOVA p-values" - ) -``` -::: - -Individual properties can be accessed via `object@`. For example: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -score_aov_pval@case_weights -#> [1] TRUE -score_aov_pval@fallback_value -#> [1] Inf -score_aov_pval@direction -#> [1] "maximize" -``` -::: - -`score_aov_fstat` is another instance of the `class_score_aov` subclass: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# ANOVA F-statistic -score_aov_fstat <- - class_score_aov( - outcome_type = c("numeric", "factor"), - predictor_type = c("numeric", "factor"), - case_weights = TRUE, - range = c(0, Inf), - inclusive = c(FALSE, FALSE), - fallback_value = Inf, - score_type = "aov_fstat", - direction = "maximize", - deterministic = TRUE, - tuning = FALSE, - label = "ANOVA F-statistics" - ) -``` -::: - -## Fitting (or estimating) score - -So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. - -We now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score. - -The generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. - -The ANOVA F-test filter, for example: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Check the class of the object -class(score_aov_pval) -#> [1] "filtro::class_score_aov" "filtro::class_score" -#> [3] "S7_object" -class(score_aov_fstat) -#> [1] "filtro::class_score_aov" "filtro::class_score" -#> [3] "S7_object" -``` -::: - -Both objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Method dispatch for objects of class `class_score_aov` -score_aov_pval |> - fit(Sale_Price ~ ., data = ames) -score_aov_fstat |> - fit(Sale_Price ~ ., data = ames) -``` -::: - -The correlation filter, for another example: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Check the class of the object -class(score_cor_pearson) -#> [1] "filtro::class_score_cor" "filtro::class_score" -#> [3] "S7_object" -class(score_cor_spearman) -#> [1] "filtro::class_score_cor" "filtro::class_score" -#> [3] "S7_object" -``` -::: - -Both objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Method dispatch for objects of class `class_score_cor` -score_cor_pearson |> - fit(Sale_Price ~ ., data = ames) -score_cor_spearman |> - fit(Sale_Price ~ ., data = ames) -``` -::: - -## Documenting S7 methods - -Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: - -We re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. - -The code below opens the help page for the `fit()` generic: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Help page for `fit()` generic -?fit -``` -::: - -The code below opens the help page for specific `fit()` method: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Help page for `fit()` method along with the documentation for the specific object -?score_aov_pval -?score_aov_fstat -``` -::: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Help page for `fit()` method along with the documentation for the specific object -?score_cor_pearson -?score_cor_spearman -``` -::: - -## Accessing results after fitting - -Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. - -We use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. - -::: {.cell layout-align="center"} - -```{.r .cell-code} -library(modeldata) -ames_subset <- modeldata::ames |> - # Use a subset of data for demonstration - dplyr::select( - Sale_Price, - MS_SubClass, - MS_Zoning, - Lot_Frontage, - Lot_Area, - Street - ) -ames_subset <- ames_subset |> - dplyr::mutate(Sale_Price = log10(Sale_Price)) -``` -::: - -Next, we fit the score as we discuss before: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Specify ANOVA p-value and fit score -ames_aov_pval_res <- - score_aov_pval |> - fit(Sale_Price ~ ., data = ames_subset) -``` -::: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -# Specify ANOVA F-statistic and fit score -ames_aov_fstat_res <- - score_aov_fstat |> - fit(Sale_Price ~ ., data = ames_subset) -``` -::: - -Recall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -ames_aov_pval_res@results -#> # A tibble: 5 × 4 -#> name score outcome predictor -#> -#> 1 aov_pval 237. Sale_Price MS_SubClass -#> 2 aov_pval 130. Sale_Price MS_Zoning -#> 3 aov_pval NA Sale_Price Lot_Frontage -#> 4 aov_pval NA Sale_Price Lot_Area -#> 5 aov_pval 5.75 Sale_Price Street -``` -::: - -::: {.cell layout-align="center"} - -```{.r .cell-code} -ames_aov_fstat_res@results -#> # A tibble: 5 × 4 -#> name score outcome predictor -#> -#> 1 aov_fstat 94.5 Sale_Price MS_SubClass -#> 2 aov_fstat 115. Sale_Price MS_Zoning -#> 3 aov_fstat NA Sale_Price Lot_Frontage -#> 4 aov_fstat NA Sale_Price Lot_Area -#> 5 aov_fstat 22.9 Sale_Price Street -``` -::: - -## Session information {#session-info} - -::: {.cell layout-align="center"} - -``` -#> -#> Attaching package: 'dplyr' -#> The following objects are masked from 'package:stats': -#> -#> filter, lag -#> The following objects are masked from 'package:base': -#> -#> intersect, setdiff, setequal, union -#> ─ Session info ───────────────────────────────────────────────────── -#> version R version 4.5.0 (2025-04-11) -#> language (EN) -#> date 2025-08-20 -#> pandoc 3.6.3 -#> quarto 1.7.32 -#> -#> ─ Packages ───────────────────────────────────────────────────────── -#> package version date (UTC) source -#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0) -#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235) -#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0) -#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0) -#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0) -#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0) -#> -#> ──────────────────────────────────────────────────────────────────── -``` -::: - diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 887b550f..d0fb7323 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -76,7 +76,7 @@ args(class_score) ## Scoring object specific to the scoring method -We demonstrate how to create a custom scoring object specific to a given feature selection method. +We demonstrate how to create a custom scoring object specific to the given scoring method. As an example, let’s consider the ANOVA F-test filter. @@ -106,7 +106,7 @@ For the ANOVA F-test filter, users can represent the score using either We demonstrate how to create these instances (objects) accordingly. -We create `score_aov_pval` as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties: +`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties: ```{r} #| eval: false From 549822bcc32e7cd5b4f737a781e508630dc50c89 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Thu, 28 Aug 2025 12:01:57 -0700 Subject: [PATCH 16/21] Revising based on Hannah's feedback; WIP --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 352 ++++++++++++++++++ learn/develop/filtro/index.qmd | 114 +++--- ...hting-303f0b4dceb2a814a9bbef461efe1684.css | 236 ------------ 4 files changed, 412 insertions(+), 294 deletions(-) delete mode 100644 site_libs/quarto-html/quarto-syntax-highlighting-303f0b4dceb2a814a9bbef461efe1684.css diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 16fd2bec..79f96651 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "220fde2de89ece42c0a6700dfa1a6754", + "hash": "f6439c05f8af265cd2e66c48970479ea", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nYou can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. \n\n## Scoring object\n\nAll subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), sorts = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\n- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. \n\n- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. \n\n- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. \n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. \n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`.\n\n- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`.\n\n- `calculating_fn`: What function, if any, is used to estimate the values from data?\n\n- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`.\n\n- `packages`: What packages, if any, are required to train the method?\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\n## Scoring object specific to the scoring method\n\nWe demonstrate how to create a custom scoring object specific to the given scoring method.\n\nAs an example, let’s consider the ANOVA F-test filter. \n\n`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class, `class_score_aov` also includes:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) score\n\nSo far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score.\n\nThe generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\nThe correlation filter, for another example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Check the class of the object\nclass(score_cor_pearson)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_cor_spearman)\n#> [1] \"filtro::class_score_cor\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Method dispatch for objects of class `class_score_cor`\nscore_cor_pearson |>\n fit(Sale_Price ~ ., data = ames)\nscore_cor_spearman |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Help page for `fit()` method along with the documentation for the specific object\n?score_cor_pearson\n?score_cor_spearman\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-20\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-19 Github (tidymodels/filtro@441d235)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). \n\nHowever, you might need to define your own scoring objects. \n\nThis article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Construct a parent scoring object using `class_score()`, specifying fixed properties. \n\n2. Construct a custom scoring object using `class_score_*()`, defining additional properties.\n\n3. Define the scoring method in `fit()` to compute feature score. `fit()` refers to the custom scoring object from Step 2 to determine which method to dispatch.\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter.\n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by constructing a new scoring object using `class_score()`. \n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on the remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\nNext, we demonstrate how to create a custom scoring object. \n\nAs an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. \n\n`class_score_aov` is a subclass of `class_score`. It inherits all fixed properties from the parent class, while allowing additional implementation-specific properties to be added in the subclass. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes: \n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a generic and as the methods used to fit (or estimate) feature score. \n\n1. The generic `fit()` is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each method `fit()` performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object (rather than the parent scoring object) to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method, we need to define S7 method that implements the scoring logic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO finish the rest of the function\n\n object@results <- res\n object\n}\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from generics. Instead of documenting each `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-28\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index e2dad99e..e0054011 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -11,3 +11,355 @@ toc-depth: 3 include-after-body: ../../../resources.html --- +## Introduction + +To use code in this article, you will need to install the following packages: filtro and modeldata. + +filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. + +Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). + +However, you might need to define your own scoring objects. + +This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. + +The general procedure is to: + +1. Construct a parent scoring object using `class_score()`, specifying fixed properties. + +2. Construct a custom scoring object using `class_score_*()`, defining additional properties. + +3. Define the scoring method in `fit()` to compute feature score. `fit()` refers to the custom scoring object from Step 2 to determine which method to dispatch. + +As an example, we will walk through the steps to create an ANOVA F-test filter. + +## Scoring object + +All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by constructing a new scoring object using `class_score()`. + +These are the fixed properties (attributes) for this object: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +library(filtro) +args(class_score) +#> function (outcome_type = c("numeric", "factor"), predictor_type = c("numeric", +#> "factor"), case_weights = logical(0), range = integer(0), inclusive = logical(0), +#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, +#> direction = character(0), deterministic = logical(0), tuning = logical(0), +#> calculating_fn = function() NULL, label = character(0), packages = character(0), +#> results = data.frame()) +#> NULL +``` +::: + +For example: + +- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. + +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. + +- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. + +- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. + +For details on the remaining properties, please refer to the package documentation. + +## Custom scoring object + +Next, we demonstrate how to create a custom scoring object. + +As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. + +`class_score_aov` is a subclass of `class_score`. It inherits all fixed properties from the parent class, while allowing additional implementation-specific properties to be added in the subclass. For example: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Create a subclass named 'class_score_aov' +class_score_aov <- S7::new_class( + "class_score_aov", + parent = class_score, + properties = list( + neg_log10 = S7::new_property(S7::class_logical, default = TRUE) + ) +) +``` +::: + +In addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes: + +- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. + +For the ANOVA F-test filter, users can represent the score using either + +- p-value or + +- F-statistic. + +We demonstrate how to create these instances (objects) accordingly. + +`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# ANOVA p-value +score_aov_pval <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_pval", + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA p-values" + ) +``` +::: + +Individual properties can be accessed via `object@`. For example: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +score_aov_pval@case_weights +#> [1] TRUE +score_aov_pval@fallback_value +#> [1] Inf +score_aov_pval@direction +#> [1] "maximize" +``` +::: + +`score_aov_fstat` is another instance of the `class_score_aov` subclass: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# ANOVA F-statistic +score_aov_fstat <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_fstat", + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA F-statistics" + ) +``` +::: + +## Fitting (or estimating) feature score + +So far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. + +We now discuss the dual role of `fit()`: it functions both as a generic and as the methods used to fit (or estimate) feature score. + +1. The generic `fit()` is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. + +2. We also define multiple methods named `fit()`. Each method `fit()` performs the actual fitting or score estimation for a specific class of object. + +In other words, when `fit()` is called, the generic refers to the custom scoring object (rather than the parent scoring object) to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. + +The ANOVA F-test filter, for example: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# User-level example: Check the class of the object +class(score_aov_pval) +#> [1] "filtro::class_score_aov" "filtro::class_score" +#> [3] "S7_object" +class(score_aov_fstat) +#> [1] "filtro::class_score_aov" "filtro::class_score" +#> [3] "S7_object" +``` +::: + +Both instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# User-level example: Method dispatch for objects of class `class_score_aov` +score_aov_pval |> + fit(Sale_Price ~ ., data = ames) +score_aov_fstat |> + fit(Sale_Price ~ ., data = ames) +``` +::: + +## Defining S7 methods + +To use the `fit()` method, we need to define S7 method that implements the scoring logic: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Define the scoring method for `class_score_aov` +#' @export +S7::method(fit, class_score_aov) <- function( + object, + formula, + data, + case_weights = NULL, + ... +) { + # TODO finish the rest of the function + + object@results <- res + object +} +``` +::: + +## Documenting S7 methods + +Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: + +We re-export the `fit()` generic from generics. Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. + +The code below opens the help page for the `fit()` generic: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# User-level example: Help page for `fit()` generic +?fit +``` +::: + +The code below opens the help page for specific `fit()` method: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# User-level example: Help page for `fit()` method along with the documentation for the specific object +?score_aov_pval +?score_aov_fstat +``` +::: + +## Accessing results after fitting + +Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. + +We use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. + +::: {.cell layout-align="center"} + +```{.r .cell-code} +library(modeldata) +ames_subset <- modeldata::ames |> + # Use a subset of data for demonstration + dplyr::select( + Sale_Price, + MS_SubClass, + MS_Zoning, + Lot_Frontage, + Lot_Area, + Street + ) +ames_subset <- ames_subset |> + dplyr::mutate(Sale_Price = log10(Sale_Price)) +``` +::: + +Next, we fit the score as we discuss before: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Specify ANOVA p-value and fit score +ames_aov_pval_res <- + score_aov_pval |> + fit(Sale_Price ~ ., data = ames_subset) +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Specify ANOVA F-statistic and fit score +ames_aov_fstat_res <- + score_aov_fstat |> + fit(Sale_Price ~ ., data = ames_subset) +``` +::: + +Recall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +ames_aov_pval_res@results +#> # A tibble: 5 × 4 +#> name score outcome predictor +#> +#> 1 aov_pval 237. Sale_Price MS_SubClass +#> 2 aov_pval 130. Sale_Price MS_Zoning +#> 3 aov_pval NA Sale_Price Lot_Frontage +#> 4 aov_pval NA Sale_Price Lot_Area +#> 5 aov_pval 5.75 Sale_Price Street +``` +::: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +ames_aov_fstat_res@results +#> # A tibble: 5 × 4 +#> name score outcome predictor +#> +#> 1 aov_fstat 94.5 Sale_Price MS_SubClass +#> 2 aov_fstat 115. Sale_Price MS_Zoning +#> 3 aov_fstat NA Sale_Price Lot_Frontage +#> 4 aov_fstat NA Sale_Price Lot_Area +#> 5 aov_fstat 22.9 Sale_Price Street +``` +::: + +## Session information {#session-info} + +::: {.cell layout-align="center"} + +``` +#> +#> Attaching package: 'dplyr' +#> The following objects are masked from 'package:stats': +#> +#> filter, lag +#> The following objects are masked from 'package:base': +#> +#> intersect, setdiff, setequal, union +#> ─ Session info ───────────────────────────────────────────────────── +#> version R version 4.5.0 (2025-04-11) +#> language (EN) +#> date 2025-08-28 +#> pandoc 3.6.3 +#> quarto 1.7.32 +#> +#> ─ Packages ───────────────────────────────────────────────────────── +#> package version date (UTC) source +#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0) +#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50) +#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0) +#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0) +#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0) +#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0) +#> +#> ──────────────────────────────────────────────────────────────────── +``` +::: + diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index d0fb7323..53657aea 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -34,53 +34,55 @@ pkgs <- c("filtro", "modeldata") `r article_req_pkgs(pkgs)` -You can construct new scoring objects using `class_score()`. This article is a guide to creating new scoring objects. +filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. -## Scoring object +Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). -All subclasses specific to the scoring method have a parent class named `class_score`. There are a few properties (attributes) for this object: +However, you might need to define your own scoring objects. -```{r} -#| label: "class_score" -library(filtro) -args(class_score) -``` +This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. -- `outcome_type`: What types of outcome can the method handle? The options are `numeric`, `factor`, or both. +The general procedure is to: -- `predictor_type`: What types of predictor can the method handle? The options are `numeric`, `factor`, or both. +1. Construct a parent scoring object using `class_score()`, specifying fixed properties. -- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. +2. Construct a custom scoring object using `class_score_*()`, defining additional properties. -- `range`: Are there known ranges for the statistic? For example, `c(0, Inf)`, `c(0, 1)`. +3. Define the scoring method in `fit()` to compute feature score. `fit()` refers to the custom scoring object from Step 2 to determine which method to dispatch. -- `inclusive`: Are these ranges inclusive at the bounds? For example, `c(FALSE, FALSE)`, `c(TRUE, TRUE)`. +As an example, we will walk through the steps to create an ANOVA F-test filter. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. +## Scoring object -- `score_type`: What is the column name that will be used for the statistic values? For example, `aov_pval`, `aov_fstat`. +All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by constructing a new scoring object using `class_score()`. -- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. +These are the fixed properties (attributes) for this object: -- `deterministic`: Does the fitting process use random numbers? It is `TRUE` or `FALSE`. +```{r} +#| label: "class_score" +library(filtro) +args(class_score) +``` -- `tuning`: Does the method have tuning parameters? It is `TRUE` or `FALSE`. +For example: -- `calculating_fn`: What function, if any, is used to estimate the values from data? +- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. -- `label`: What label to use when printing? For example, `ANOVA p-values`, `ANOVA F-statistics`. +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. -- `packages`: What packages, if any, are required to train the method? +- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. - `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. -## Scoring object specific to the scoring method +For details on the remaining properties, please refer to the package documentation. + +## Custom scoring object -We demonstrate how to create a custom scoring object specific to the given scoring method. +Next, we demonstrate how to create a custom scoring object. -As an example, let’s consider the ANOVA F-test filter. +As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. -`class_score_aov` is a subclass of `class_score`. Any additional properties specific to the implementation can be added in the subclass. For example: +`class_score_aov` is a subclass of `class_score`. It inherits all fixed properties from the parent class, while allowing additional implementation-specific properties to be added in the subclass. For example: ```{r} #| eval: false @@ -94,7 +96,7 @@ class_score_aov <- S7::new_class( ) ``` -In addition to the properties inherited from the parent class, `class_score_aov` also includes: +In addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes: - `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. @@ -156,63 +158,70 @@ score_aov_fstat <- ) ``` -## Fitting (or estimating) score +## Fitting (or estimating) feature score -So far, we have discussed how to create a subclass and construct instances (objects) for the ANOVA F-test filter. +So far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. -We now discuss the dual role of `fit()`, both as a generic and as methods used to fit (or estimate) score. +We now discuss the dual role of `fit()`: it functions both as a generic and as the methods used to fit (or estimate) feature score. -The generic named `fit()` is re-exported and dispatches to the appropriate method based on the class of the object passed. We define multiple methods named `fit()`. Each of these methods performs the actual fitting or score estimation for a specific class of object. +1. The generic `fit()` is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. + +2. We also define multiple methods named `fit()`. Each method `fit()` performs the actual fitting or score estimation for a specific class of object. + +In other words, when `fit()` is called, the generic refers to the custom scoring object (rather than the parent scoring object) to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. The ANOVA F-test filter, for example: ```{r} -# Check the class of the object +# User-level example: Check the class of the object class(score_aov_pval) class(score_aov_fstat) ``` -Both objects belong to the class `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test filter: +Both instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test: ```{r} #| eval: false -# Method dispatch for objects of class `class_score_aov` +# User-level example: Method dispatch for objects of class `class_score_aov` score_aov_pval |> fit(Sale_Price ~ ., data = ames) score_aov_fstat |> fit(Sale_Price ~ ., data = ames) ``` -The correlation filter, for another example: +## Defining S7 methods -```{r} -# Check the class of the object -class(score_cor_pearson) -class(score_cor_spearman) -``` - -Both objects belong to the class `class_score_cor`. Therefore, when `fit()` is called, the method for `class_score_cor` is dispatched, performing the actual fitting using the correlation filter: +To use the `fit()` method, we need to define S7 method that implements the scoring logic: ```{r} #| eval: false -# Method dispatch for objects of class `class_score_cor` -score_cor_pearson |> - fit(Sale_Price ~ ., data = ames) -score_cor_spearman |> - fit(Sale_Price ~ ., data = ames) +# Define the scoring method for `class_score_aov` +#' @export +S7::method(fit, class_score_aov) <- function( + object, + formula, + data, + case_weights = NULL, + ... +) { + # TODO finish the rest of the function + + object@results <- res + object +} ``` ## Documenting S7 methods Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: -We re-export the `fit()` generic from another package. Instead of documenting each individual `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. +We re-export the `fit()` generic from generics. Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. The code below opens the help page for the `fit()` generic: ```{r} #| eval: false -# Help page for `fit()` generic +# User-level example: Help page for `fit()` generic ?fit ``` @@ -220,18 +229,11 @@ The code below opens the help page for specific `fit()` method: ```{r} #| eval: false -# Help page for `fit()` method along with the documentation for the specific object +# User-level example: Help page for `fit()` method along with the documentation for the specific object ?score_aov_pval ?score_aov_fstat ``` -```{r} -#| eval: false -# Help page for `fit()` method along with the documentation for the specific object -?score_cor_pearson -?score_cor_spearman -``` - ## Accessing results after fitting Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. diff --git a/site_libs/quarto-html/quarto-syntax-highlighting-303f0b4dceb2a814a9bbef461efe1684.css b/site_libs/quarto-html/quarto-syntax-highlighting-303f0b4dceb2a814a9bbef461efe1684.css deleted file mode 100644 index ee5a58e3..00000000 --- a/site_libs/quarto-html/quarto-syntax-highlighting-303f0b4dceb2a814a9bbef461efe1684.css +++ /dev/null @@ -1,236 +0,0 @@ -/* quarto syntax highlight colors */ -:root { - --quarto-hl-ot-color: #003B4F; - --quarto-hl-at-color: #657422; - --quarto-hl-ss-color: #20794D; - --quarto-hl-an-color: #5E5E5E; - --quarto-hl-fu-color: #4758AB; - --quarto-hl-st-color: #20794D; - --quarto-hl-cf-color: #003B4F; - --quarto-hl-op-color: #5E5E5E; - --quarto-hl-er-color: #AD0000; - --quarto-hl-bn-color: #AD0000; - --quarto-hl-al-color: #AD0000; - --quarto-hl-va-color: #111111; - --quarto-hl-bu-color: inherit; - --quarto-hl-ex-color: inherit; - --quarto-hl-pp-color: #AD0000; - --quarto-hl-in-color: #5E5E5E; - --quarto-hl-vs-color: #20794D; - --quarto-hl-wa-color: #5E5E5E; - --quarto-hl-do-color: #5E5E5E; - --quarto-hl-im-color: #00769E; - --quarto-hl-ch-color: #20794D; - --quarto-hl-dt-color: #AD0000; - --quarto-hl-fl-color: #AD0000; - --quarto-hl-co-color: #5E5E5E; - --quarto-hl-cv-color: #5E5E5E; - --quarto-hl-cn-color: #8f5902; - --quarto-hl-sc-color: #5E5E5E; - --quarto-hl-dv-color: #AD0000; - --quarto-hl-kw-color: #003B4F; -} - -/* other quarto variables */ -:root { - --quarto-font-monospace: "Source Code Pro", monospace; -} - -/* syntax highlight based on Pandoc's rules */ -pre > code.sourceCode > span { - color: #003B4F; -} - -code.sourceCode > span { - color: #003B4F; -} - -div.sourceCode, -div.sourceCode pre.sourceCode { - color: #003B4F; -} - -/* Normal */ -code span { - color: #003B4F; -} - -/* Alert */ -code span.al { - color: #AD0000; - font-style: inherit; -} - -/* Annotation */ -code span.an { - color: #5E5E5E; - font-style: inherit; -} - -/* Attribute */ -code span.at { - color: #657422; - font-style: inherit; -} - -/* BaseN */ -code span.bn { - color: #AD0000; - font-style: inherit; -} - -/* BuiltIn */ -code span.bu { - font-style: inherit; -} - -/* ControlFlow */ -code span.cf { - color: #003B4F; - font-weight: bold; - font-style: inherit; -} - -/* Char */ -code span.ch { - color: #20794D; - font-style: inherit; -} - -/* Constant */ -code span.cn { - color: #8f5902; - font-style: inherit; -} - -/* Comment */ -code span.co { - color: #5E5E5E; - font-style: inherit; -} - -/* CommentVar */ -code span.cv { - color: #5E5E5E; - font-style: italic; -} - -/* Documentation */ -code span.do { - color: #5E5E5E; - font-style: italic; -} - -/* DataType */ -code span.dt { - color: #AD0000; - font-style: inherit; -} - -/* DecVal */ -code span.dv { - color: #AD0000; - font-style: inherit; -} - -/* Error */ -code span.er { - color: #AD0000; - font-style: inherit; -} - -/* Extension */ -code span.ex { - font-style: inherit; -} - -/* Float */ -code span.fl { - color: #AD0000; - font-style: inherit; -} - -/* Function */ -code span.fu { - color: #4758AB; - font-style: inherit; -} - -/* Import */ -code span.im { - color: #00769E; - font-style: inherit; -} - -/* Information */ -code span.in { - color: #5E5E5E; - font-style: inherit; -} - -/* Keyword */ -code span.kw { - color: #003B4F; - font-weight: bold; - font-style: inherit; -} - -/* Operator */ -code span.op { - color: #5E5E5E; - font-style: inherit; -} - -/* Other */ -code span.ot { - color: #003B4F; - font-style: inherit; -} - -/* Preprocessor */ -code span.pp { - color: #AD0000; - font-style: inherit; -} - -/* SpecialChar */ -code span.sc { - color: #5E5E5E; - font-style: inherit; -} - -/* SpecialString */ -code span.ss { - color: #20794D; - font-style: inherit; -} - -/* String */ -code span.st { - color: #20794D; - font-style: inherit; -} - -/* Variable */ -code span.va { - color: #111111; - font-style: inherit; -} - -/* VerbatimString */ -code span.vs { - color: #20794D; - font-style: inherit; -} - -/* Warning */ -code span.wa { - color: #5E5E5E; - font-style: italic; -} - -.prevent-inlining { - content: " Date: Thu, 28 Aug 2025 14:43:34 -0700 Subject: [PATCH 17/21] Finalizing --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 177 +++++++++++++++--- learn/develop/filtro/index.qmd | 169 ++++++++++++++--- 3 files changed, 306 insertions(+), 44 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 79f96651..d6042e53 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "f6439c05f8af265cd2e66c48970479ea", + "hash": "b9bf1801286839c3a840f868009a058f", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). \n\nHowever, you might need to define your own scoring objects. \n\nThis article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Construct a parent scoring object using `class_score()`, specifying fixed properties. \n\n2. Construct a custom scoring object using `class_score_*()`, defining additional properties.\n\n3. Define the scoring method in `fit()` to compute feature score. `fit()` refers to the custom scoring object from Step 2 to determine which method to dispatch.\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter.\n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by constructing a new scoring object using `class_score()`. \n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(filtro)\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on the remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\nNext, we demonstrate how to create a custom scoring object. \n\nAs an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. \n\n`class_score_aov` is a subclass of `class_score`. It inherits all fixed properties from the parent class, while allowing additional implementation-specific properties to be added in the subclass. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes: \n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a generic and as the methods used to fit (or estimate) feature score. \n\n1. The generic `fit()` is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each method `fit()` performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object (rather than the parent scoring object) to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method, we need to define S7 method that implements the scoring logic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO finish the rest of the function\n\n object@results <- res\n object\n}\n```\n:::\n\n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\nWe re-export the `fit()` generic from generics. Instead of documenting each `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-28\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. \n\n2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. \n\n3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method.\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter.\n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nNext, we demonstrate how to create a custom scoring object `class_score_*`. \n\nAs an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function\n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nTo enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm() and anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-28\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index e0054011..75a0509b 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -17,32 +17,44 @@ To use code in this article, you will need to install the following packages: f filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. -Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). +Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. -However, you might need to define your own scoring objects. +The general procedure is to: -This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. +1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. -The general procedure is to: +2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. -1. Construct a parent scoring object using `class_score()`, specifying fixed properties. +3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method. -2. Construct a custom scoring object using `class_score_*()`, defining additional properties. +The hierarchy can be visualized as: -3. Define the scoring method in `fit()` to compute feature score. `fit()` refers to the custom scoring object from Step 2 to determine which method to dispatch. +``` +class_score +└─> class_score_* + └─> fit() +``` As an example, we will walk through the steps to create an ANOVA F-test filter. ## Scoring object -All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by constructing a new scoring object using `class_score()`. +All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: + +::: {.cell layout-align="center"} + +```{.r .cell-code} +# Create a parent class +library(filtro) +class_score +``` +::: These are the fixed properties (attributes) for this object: ::: {.cell layout-align="center"} ```{.r .cell-code} -library(filtro) args(class_score) #> function (outcome_type = c("numeric", "factor"), predictor_type = c("numeric", #> "factor"), case_weights = logical(0), range = integer(0), inclusive = logical(0), @@ -64,15 +76,22 @@ For example: - `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. -For details on the remaining properties, please refer to the package documentation. +For details on its constructor and its remaining properties, please refer to the package documentation. ## Custom scoring object -Next, we demonstrate how to create a custom scoring object. +``` +class_score +└─> class_score_aov (example shown) +└─> class_score_cor +└─> ... +``` + +Next, we demonstrate how to create a custom scoring object `class_score_*`. As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. -`class_score_aov` is a subclass of `class_score`. It inherits all fixed properties from the parent class, while allowing additional implementation-specific properties to be added in the subclass. For example: +By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example: ::: {.cell layout-align="center"} @@ -88,7 +107,7 @@ class_score_aov <- S7::new_class( ``` ::: -In addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes: +In addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property: - `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. @@ -137,6 +156,8 @@ score_aov_pval@direction ``` ::: +Note that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `"maximize"`. + `score_aov_fstat` is another instance of the `class_score_aov` subclass: ::: {.cell layout-align="center"} @@ -162,15 +183,24 @@ score_aov_fstat <- ## Fitting (or estimating) feature score +``` +class_score +└─> class_score_aov (example shown) + └─> fit() +└─> class_score_cor + └─> fit() +└─> ... +``` + So far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. -We now discuss the dual role of `fit()`: it functions both as a generic and as the methods used to fit (or estimate) feature score. +We now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. -1. The generic `fit()` is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. +1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. -2. We also define multiple methods named `fit()`. Each method `fit()` performs the actual fitting or score estimation for a specific class of object. +2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. -In other words, when `fit()` is called, the generic refers to the custom scoring object (rather than the parent scoring object) to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. +In other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. The ANOVA F-test filter, for example: @@ -202,13 +232,14 @@ score_aov_fstat |> ## Defining S7 methods -To use the `fit()` method, we need to define S7 method that implements the scoring logic: +To use the `fit()` method above, we need to define a S7 method that implements the scoring logic. + +The following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test: ::: {.cell layout-align="center"} ```{.r .cell-code} # Define the scoring method for `class_score_aov` -#' @export S7::method(fit, class_score_aov) <- function( object, formula, @@ -216,7 +247,7 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # TODO finish the rest of the function + # TODO Finish the rest of the function object@results <- res object @@ -224,11 +255,15 @@ S7::method(fit, class_score_aov) <- function( ``` ::: +We would want to do something similar for other `class_score_*` subclass. + ## Documenting S7 methods Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: -We re-export the `fit()` generic from generics. Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. +- We re-export the `fit()` generic from generics. + +- Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. The code below opens the help page for the `fit()` generic: @@ -251,6 +286,106 @@ The code below opens the help page for specific `fit()` method: ``` ::: +To enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly. + +::: {.cell layout-align="center"} + +```{.r .cell-code} +#' @export +S7::method(fit, class_score_aov) <- function( + object, + formula, + data, + case_weights = NULL, + ... +) { + # TODO Finish the rest of the function using lm() and anova() + + object@results <- res + object +} +``` +::: + +Instead, documentation is provided in the "Details" section and the "Estimating the scores" subsection of the documentation for the `score_aov_pval` object. + +::: {.cell layout-align="center"} + +```{.r .cell-code} +#' Scoring via analysis of variance hypothesis tests +#' +#' @description +#' +#' @name score_aov_pval +#' @family class score metrics +#' +#' @details +#' +#' These objects are used when either: +#' +#' ... +#' +#' ## Estimating the scores +#' +#' In \pkg{filtro}, the `score_*` objects define a scoring method (e.g., data +#' input requirements, package dependencies, etc). To compute the scores for +#' a specific data set, the `fit()` method is used. The main arguments for +#' these functions are: +#' +#' \describe{ +#' \item{`object`}{A score class object (e.g., `score_aov_pval`).} +#' \item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]} +#' \item{`data`}{A data frame containing the relevant columns defined by the formula.} +#' \item{`...`}{Further arguments passed to or from other methods.} +#' \item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.} +#' } +#' +#' ... +#' +#' @export +score_aov_pval <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_pval", + transform_fn = function(x) x, + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA p-values" + ) +``` +::: + +We can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects. + +::: {.cell layout-align="center"} + +```{.r .cell-code} +#' @name score_aov_pval +#' @export +score_aov_fstat <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_fstat", + transform_fn = function(x) x, + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA F-statistics" + ) +``` +::: + ## Accessing results after fitting Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 53657aea..466e4218 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -36,31 +36,41 @@ pkgs <- c("filtro", "modeldata") filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. -Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). +Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. -However, you might need to define your own scoring objects. +The general procedure is to: -This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. +1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. -The general procedure is to: +2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. -1. Construct a parent scoring object using `class_score()`, specifying fixed properties. +3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method. -2. Construct a custom scoring object using `class_score_*()`, defining additional properties. +The hierarchy can be visualized as: -3. Define the scoring method in `fit()` to compute feature score. `fit()` refers to the custom scoring object from Step 2 to determine which method to dispatch. +``` +class_score +└─> class_score_* + └─> fit() +``` As an example, we will walk through the steps to create an ANOVA F-test filter. ## Scoring object -All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by constructing a new scoring object using `class_score()`. +All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: + +```{r} +#| eval: false +# Create a parent class +library(filtro) +class_score +``` These are the fixed properties (attributes) for this object: ```{r} #| label: "class_score" -library(filtro) args(class_score) ``` @@ -74,15 +84,22 @@ For example: - `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. -For details on the remaining properties, please refer to the package documentation. +For details on its constructor and its remaining properties, please refer to the package documentation. ## Custom scoring object -Next, we demonstrate how to create a custom scoring object. +``` +class_score +└─> class_score_aov (example shown) +└─> class_score_cor +└─> ... +``` + +Next, we demonstrate how to create a custom scoring object `class_score_*`. As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. -`class_score_aov` is a subclass of `class_score`. It inherits all fixed properties from the parent class, while allowing additional implementation-specific properties to be added in the subclass. For example: +By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example: ```{r} #| eval: false @@ -96,7 +113,7 @@ class_score_aov <- S7::new_class( ) ``` -In addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes: +In addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property: - `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. @@ -137,6 +154,8 @@ score_aov_pval@fallback_value score_aov_pval@direction ``` +Note that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `"maximize"`. + `score_aov_fstat` is another instance of the `class_score_aov` subclass: ```{r} @@ -160,15 +179,24 @@ score_aov_fstat <- ## Fitting (or estimating) feature score +``` +class_score +└─> class_score_aov (example shown) + └─> fit() +└─> class_score_cor + └─> fit() +└─> ... +``` + So far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. -We now discuss the dual role of `fit()`: it functions both as a generic and as the methods used to fit (or estimate) feature score. +We now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. -1. The generic `fit()` is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. +1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. -2. We also define multiple methods named `fit()`. Each method `fit()` performs the actual fitting or score estimation for a specific class of object. +2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. -In other words, when `fit()` is called, the generic refers to the custom scoring object (rather than the parent scoring object) to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. +In other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. The ANOVA F-test filter, for example: @@ -191,12 +219,13 @@ score_aov_fstat |> ## Defining S7 methods -To use the `fit()` method, we need to define S7 method that implements the scoring logic: +To use the `fit()` method above, we need to define a S7 method that implements the scoring logic. + +The following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test: ```{r} #| eval: false # Define the scoring method for `class_score_aov` -#' @export S7::method(fit, class_score_aov) <- function( object, formula, @@ -204,18 +233,22 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # TODO finish the rest of the function + # TODO Finish the rest of the function object@results <- res object } ``` +We would want to do something similar for other `class_score_*` subclass. + ## Documenting S7 methods Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: -We re-export the `fit()` generic from generics. Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. +- We re-export the `fit()` generic from generics. + +- Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. The code below opens the help page for the `fit()` generic: @@ -234,6 +267,100 @@ The code below opens the help page for specific `fit()` method: ?score_aov_fstat ``` +To enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly. + +```{r} +#| eval: false +#' @export +S7::method(fit, class_score_aov) <- function( + object, + formula, + data, + case_weights = NULL, + ... +) { + # TODO Finish the rest of the function using lm() and anova() + + object@results <- res + object +} +``` + +Instead, documentation is provided in the "Details" section and the "Estimating the scores" subsection of the documentation for the `score_aov_pval` object. + +```{r} +#| eval: false +#' Scoring via analysis of variance hypothesis tests +#' +#' @description +#' +#' @name score_aov_pval +#' @family class score metrics +#' +#' @details +#' +#' These objects are used when either: +#' +#' ... +#' +#' ## Estimating the scores +#' +#' In \pkg{filtro}, the `score_*` objects define a scoring method (e.g., data +#' input requirements, package dependencies, etc). To compute the scores for +#' a specific data set, the `fit()` method is used. The main arguments for +#' these functions are: +#' +#' \describe{ +#' \item{`object`}{A score class object (e.g., `score_aov_pval`).} +#' \item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]} +#' \item{`data`}{A data frame containing the relevant columns defined by the formula.} +#' \item{`...`}{Further arguments passed to or from other methods.} +#' \item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.} +#' } +#' +#' ... +#' +#' @export +score_aov_pval <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_pval", + transform_fn = function(x) x, + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA p-values" + ) +``` + +We can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects. + +```{r} +#| eval: false +#' @name score_aov_pval +#' @export +score_aov_fstat <- + class_score_aov( + outcome_type = c("numeric", "factor"), + predictor_type = c("numeric", "factor"), + case_weights = TRUE, + range = c(0, Inf), + inclusive = c(FALSE, FALSE), + fallback_value = Inf, + score_type = "aov_fstat", + transform_fn = function(x) x, + direction = "maximize", + deterministic = TRUE, + tuning = FALSE, + label = "ANOVA F-statistics" + ) +``` + ## Accessing results after fitting Once the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. From e05fc226fd1873471bad818189f7bb0d00e6defb Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Thu, 28 Aug 2025 14:54:11 -0700 Subject: [PATCH 18/21] Elaborate on ANOVA F-test filter --- _freeze/learn/develop/filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 4 ++-- learn/develop/filtro/index.qmd | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index d6042e53..019c0a09 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "b9bf1801286839c3a840f868009a058f", + "hash": "695ada7097e0889c5edc080187b655f6", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. \n\n2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. \n\n3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method.\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter.\n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nNext, we demonstrate how to create a custom scoring object `class_score_*`. \n\nAs an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either\n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function\n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nTo enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm() and anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-28\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. \n\n2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. \n\n3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method.\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter.\n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nNext, we demonstrate how to create a custom scoring object `class_score_*`. \n\nAs an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either the \n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function\n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nTo enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm() and anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-28\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 75a0509b..bc9c13e3 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -89,7 +89,7 @@ class_score Next, we demonstrate how to create a custom scoring object `class_score_*`. -As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. +As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example: @@ -111,7 +111,7 @@ In addition to the properties inherited from the parent class (discussed in the - `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. -For the ANOVA F-test filter, users can represent the score using either +For the ANOVA F-test filter, users can represent the score using either the - p-value or diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 466e4218..57f1bf66 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -97,7 +97,7 @@ class_score Next, we demonstrate how to create a custom scoring object `class_score_*`. -As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. +As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example: @@ -117,7 +117,7 @@ In addition to the properties inherited from the parent class (discussed in the - `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`. -For the ANOVA F-test filter, users can represent the score using either +For the ANOVA F-test filter, users can represent the score using either the - p-value or From ffa95a29ade2b86efdf8c95df2e9fe27fddfae17 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Fri, 29 Aug 2025 09:29:38 -0700 Subject: [PATCH 19/21] Finalize --- .../filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 22 +++++++++---------- learn/develop/filtro/index.qmd | 20 ++++++++--------- 3 files changed, 21 insertions(+), 25 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 019c0a09..9c99c3fc 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "695ada7097e0889c5edc080187b655f6", + "hash": "ff15073a1448f1c25afdf22e5284f8cb", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. \n\n2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. \n\n3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method.\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter.\n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nNext, we demonstrate how to create a custom scoring object `class_score_*`. \n\nAs an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either the \n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nIndividual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function\n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we provide the details in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nTo enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm() and anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-28\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. \n\n2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. \n\n3. Define the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the custom scoring object from step 2 to use the appropriate `fit()` method .\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. \n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class `class_score`: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nNext, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either the \n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nOnce instantiated, individual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm(), anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar to define a S7 method for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we document it in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object (instance) `score_*`. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nTo enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm() and anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-29\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index bc9c13e3..3dcf936e 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -25,7 +25,7 @@ The general procedure is to: 2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. -3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method. +3. Define the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the custom scoring object from step 2 to use the appropriate `fit()` method . The hierarchy can be visualized as: @@ -35,11 +35,11 @@ class_score └─> fit() ``` -As an example, we will walk through the steps to create an ANOVA F-test filter. +As an example, we will walk through the steps to create an ANOVA F-test filter. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. ## Scoring object -All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: +All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class `class_score`: ::: {.cell layout-align="center"} @@ -87,11 +87,9 @@ class_score └─> ... ``` -Next, we demonstrate how to create a custom scoring object `class_score_*`. +Next, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. -As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. - -By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example: +By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example: ::: {.cell layout-align="center"} @@ -142,7 +140,7 @@ score_aov_pval <- ``` ::: -Individual properties can be accessed via `object@`. For example: +Once instantiated, individual properties can be accessed via `object@`. For example: ::: {.cell layout-align="center"} @@ -247,7 +245,7 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # TODO Finish the rest of the function + # TODO Finish the rest of the function using lm(), anova() object@results <- res object @@ -255,7 +253,7 @@ S7::method(fit, class_score_aov) <- function( ``` ::: -We would want to do something similar for other `class_score_*` subclass. +We would want to do something similar to define a S7 method for other `class_score_*` subclass. ## Documenting S7 methods @@ -263,7 +261,7 @@ Documentation for S7 methods is still a work in progress, and it seems no one cu - We re-export the `fit()` generic from generics. -- Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. +- Instead of documenting each `fit()` method, we document it in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object (instance) `score_*`. The code below opens the help page for the `fit()` generic: @@ -481,7 +479,7 @@ ames_aov_fstat_res@results #> ─ Session info ───────────────────────────────────────────────────── #> version R version 4.5.0 (2025-04-11) #> language (EN) -#> date 2025-08-28 +#> date 2025-08-29 #> pandoc 3.6.3 #> quarto 1.7.32 #> diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 57f1bf66..664a4ac9 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -44,7 +44,7 @@ The general procedure is to: 2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. -3. Define the scoring method in `fit()`, which computes feature score. `fit()` refers to the custom scoring object from step 2 to use the appropriate method. +3. Define the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the custom scoring object from step 2 to use the appropriate `fit()` method . The hierarchy can be visualized as: @@ -54,11 +54,11 @@ class_score └─> fit() ``` -As an example, we will walk through the steps to create an ANOVA F-test filter. +As an example, we will walk through the steps to create an ANOVA F-test filter. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. ## Scoring object -All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class: +All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class `class_score`: ```{r} #| eval: false @@ -95,11 +95,9 @@ class_score └─> ... ``` -Next, we demonstrate how to create a custom scoring object `class_score_*`. +Next, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. -As an example, let’s create a custom scoring object for ANOVA F-test named `class_score_aov`. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. - -By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties = ` argument. For example: +By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example: ```{r} #| eval: false @@ -146,7 +144,7 @@ score_aov_pval <- ) ``` -Individual properties can be accessed via `object@`. For example: +Once instantiated, individual properties can be accessed via `object@`. For example: ```{r} score_aov_pval@case_weights @@ -233,14 +231,14 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # TODO Finish the rest of the function + # TODO Finish the rest of the function using lm(), anova() object@results <- res object } ``` -We would want to do something similar for other `class_score_*` subclass. +We would want to do something similar to define a S7 method for other `class_score_*` subclass. ## Documenting S7 methods @@ -248,7 +246,7 @@ Documentation for S7 methods is still a work in progress, and it seems no one cu - We re-export the `fit()` generic from generics. -- Instead of documenting each `fit()` method, we provide the details in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object. +- Instead of documenting each `fit()` method, we document it in the "Details" section and the "Estimating the scores" subsection of the documentation for the corresponding object (instance) `score_*`. The code below opens the help page for the `fit()` generic: From ace06ad11aae85997c77bafb554bc99457f58d39 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Fri, 29 Aug 2025 13:41:16 -0700 Subject: [PATCH 20/21] Finalizing --- .../filtro/index/execute-results/html.json | 4 +- learn/develop/filtro/index.html.md | 79 ++++++++++--------- learn/develop/filtro/index.qmd | 65 +++++++-------- 3 files changed, 77 insertions(+), 71 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 9c99c3fc..5d4833d3 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "ff15073a1448f1c25afdf22e5284f8cb", + "hash": "fdfe5f16b0e0bf204e522365fe024e30", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nfiltro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nThe general procedure is to:\n\n1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. \n\n2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. \n\n3. Define the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the custom scoring object from step 2 to use the appropriate `fit()` method .\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAs an example, we will walk through the steps to create an ANOVA F-test filter. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. \n\n## Scoring object\n\nAll the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class `class_score`: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation.\n\n## Custom scoring object\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nNext, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either the \n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nOnce instantiated, individual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Fitting (or estimating) feature score\n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nTo use the `fit()` method above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm(), anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar to define a S7 method for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we document it in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object (instance) `score_*`. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nTo enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # TODO Finish the rest of the function using lm() and anova()\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-29\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50)\n#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0)\n#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nFor reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nThere is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own score class object.\n\nThe general procedure is to:\n\n1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method-specific properties, as opposed to the general characteristics already defined in the parent class. \n\n2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method .\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAdditionally, we provide guidance on documenting an S7 method.\n\n## Parent class (General scoring object)\n\nAll the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `minimize`, `maximum`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation or [check it out here](https://github.com/tidymodels/filtro/blob/main/R/class_score.R). \n\n## Subclass (Custom scoring object)\n\nAll custom scoring objects implemented in filtro are subclasses of the parent class `class_score`, meaning that they all inherit the parent class's fixed properties. When creating a new scoring object, we do so by defining another subclass of this parent class.\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nWe demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`, as an example. \n\nFor reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all of the fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either the \n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nOnce instantiated, individual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\nThe F-statistic is not transformed, nor does it provide an option for transformation. Nevertheless, it also uses a fallback value of `Inf` with the direction set to `\"maximize\"`, since larger F-statistic values indicate more important predictors. \n\n## Fitting (or estimating) feature score\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nFor users to use the `fit()` method described above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # This is where you include the rest of the function \n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar to define a S7 method for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, but our current best approach is as follows: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we document it in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object (instance) `score_*`. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nFor users to access the help page using `?` as described above, the `fit()` method needs to be exported using `#' @export`, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n ...\n) {\n # Include the rest of the function here\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-29\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.2.0 2025-08-26 CRAN (R 4.5.0)\n#> modeldata 1.5.1 2025-08-22 CRAN (R 4.5.0)\n#> purrr 1.1.0 2025-07-10 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.3.0 2025-06-08 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index 3dcf936e..cc88d408 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -15,17 +15,17 @@ include-after-body: ../../../resources.html To use code in this article, you will need to install the following packages: filtro and modeldata. -filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. - Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. -The general procedure is to: +For reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. -1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. +There is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own score class object. + +The general procedure is to: -2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. +1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method-specific properties, as opposed to the general characteristics already defined in the parent class. -3. Define the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the custom scoring object from step 2 to use the appropriate `fit()` method . +2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method . The hierarchy can be visualized as: @@ -35,11 +35,11 @@ class_score └─> fit() ``` -As an example, we will walk through the steps to create an ANOVA F-test filter. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. +Additionally, we provide guidance on documenting an S7 method. -## Scoring object +## Parent class (General scoring object) -All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class `class_score`: +All the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented: ::: {.cell layout-align="center"} @@ -72,13 +72,15 @@ For example: - `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. -- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. +- `direction`: What direction of values indicates the most important values? For example, `minimize`, `maximum`. - `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. -For details on its constructor and its remaining properties, please refer to the package documentation. +For details on its constructor and its remaining properties, please refer to the package documentation or [check it out here](https://github.com/tidymodels/filtro/blob/main/R/class_score.R). -## Custom scoring object +## Subclass (Custom scoring object) + +All custom scoring objects implemented in filtro are subclasses of the parent class `class_score`, meaning that they all inherit the parent class's fixed properties. When creating a new scoring object, we do so by defining another subclass of this parent class. ``` class_score @@ -87,9 +89,11 @@ class_score └─> ... ``` -Next, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. +We demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`, as an example. + +For reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. -By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example: +By setting `parent = class_score`, the subclass `class_score_aov` inherits all of the fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example: ::: {.cell layout-align="center"} @@ -179,16 +183,9 @@ score_aov_fstat <- ``` ::: -## Fitting (or estimating) feature score +The F-statistic is not transformed, nor does it provide an option for transformation. Nevertheless, it also uses a fallback value of `Inf` with the direction set to `"maximize"`, since larger F-statistic values indicate more important predictors. -``` -class_score -└─> class_score_aov (example shown) - └─> fit() -└─> class_score_cor - └─> fit() -└─> ... -``` +## Fitting (or estimating) feature score So far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. @@ -200,6 +197,15 @@ We now discuss the dual role of `fit()`: it functions both as a *generic* and as In other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. +``` +class_score +└─> class_score_aov (example shown) + └─> fit() +└─> class_score_cor + └─> fit() +└─> ... +``` + The ANOVA F-test filter, for example: ::: {.cell layout-align="center"} @@ -230,7 +236,7 @@ score_aov_fstat |> ## Defining S7 methods -To use the `fit()` method above, we need to define a S7 method that implements the scoring logic. +For users to use the `fit()` method described above, we need to define a S7 method that implements the scoring logic. The following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test: @@ -245,7 +251,7 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # TODO Finish the rest of the function using lm(), anova() + # This is where you include the rest of the function object@results <- res object @@ -257,7 +263,7 @@ We would want to do something similar to define a S7 method for other `class_sco ## Documenting S7 methods -Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: +Documentation for S7 methods is still a work in progress, but our current best approach is as follows: - We re-export the `fit()` generic from generics. @@ -284,7 +290,7 @@ The code below opens the help page for specific `fit()` method: ``` ::: -To enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly. +For users to access the help page using `?` as described above, the `fit()` method needs to be exported using `#' @export`, but it is not documented directly. ::: {.cell layout-align="center"} @@ -292,12 +298,9 @@ To enable the `?` help page above, the `fit()` method is exported so it can be c #' @export S7::method(fit, class_score_aov) <- function( object, - formula, - data, - case_weights = NULL, ... ) { - # TODO Finish the rest of the function using lm() and anova() + # Include the rest of the function here object@results <- res object @@ -484,13 +487,13 @@ ames_aov_fstat_res@results #> quarto 1.7.32 #> #> ─ Packages ───────────────────────────────────────────────────────── -#> package version date (UTC) source -#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0) -#> filtro 0.1.0.9000 2025-08-26 Github (tidymodels/filtro@f8ffd50) -#> modeldata 1.4.0 2024-06-19 CRAN (R 4.5.0) -#> purrr 1.0.4 2025-02-05 CRAN (R 4.5.0) -#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0) -#> tibble 3.2.1 2023-03-20 CRAN (R 4.5.0) +#> package version date (UTC) source +#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0) +#> filtro 0.2.0 2025-08-26 CRAN (R 4.5.0) +#> modeldata 1.5.1 2025-08-22 CRAN (R 4.5.0) +#> purrr 1.1.0 2025-07-10 CRAN (R 4.5.0) +#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0) +#> tibble 3.3.0 2025-06-08 CRAN (R 4.5.0) #> #> ──────────────────────────────────────────────────────────────────── ``` diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 664a4ac9..86f61906 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -34,17 +34,17 @@ pkgs <- c("filtro", "modeldata") `r article_req_pkgs(pkgs)` -filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. - Currently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. -The general procedure is to: +For reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. -1. Create a parent scoring object `class_score`, specifying fixed properties that are shared across all custom scoring objects. +There is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own score class object. + +The general procedure is to: -2. Construct a custom scoring object `class_score_*`, adding additional, implementation-specific properties. +1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method-specific properties, as opposed to the general characteristics already defined in the parent class. -3. Define the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the custom scoring object from step 2 to use the appropriate `fit()` method . +2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method . The hierarchy can be visualized as: @@ -54,11 +54,11 @@ class_score └─> fit() ``` -As an example, we will walk through the steps to create an ANOVA F-test filter. This filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. +Additionally, we provide guidance on documenting an S7 method. -## Scoring object +## Parent class (General scoring object) -All the custom scoring objects share the same parent class named `class_score`. Therefore, we start by creating a parent class `class_score`: +All the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented: ```{r} #| eval: false @@ -80,13 +80,15 @@ For example: - `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. -- `direction`: What direction of values indicates the most important values? For example, `maximum`, `minimize`. +- `direction`: What direction of values indicates the most important values? For example, `minimize`, `maximum`. - `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. -For details on its constructor and its remaining properties, please refer to the package documentation. +For details on its constructor and its remaining properties, please refer to the package documentation or [check it out here](https://github.com/tidymodels/filtro/blob/main/R/class_score.R). -## Custom scoring object +## Subclass (Custom scoring object) + +All custom scoring objects implemented in filtro are subclasses of the parent class `class_score`, meaning that they all inherit the parent class's fixed properties. When creating a new scoring object, we do so by defining another subclass of this parent class. ``` class_score @@ -95,9 +97,11 @@ class_score └─> ... ``` -Next, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. +We demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`, as an example. + +For reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. -By setting `parent = class_score`, the subclass `class_score_aov` inherits all fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example: +By setting `parent = class_score`, the subclass `class_score_aov` inherits all of the fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example: ```{r} #| eval: false @@ -175,16 +179,9 @@ score_aov_fstat <- ) ``` -## Fitting (or estimating) feature score +The F-statistic is not transformed, nor does it provide an option for transformation. Nevertheless, it also uses a fallback value of `Inf` with the direction set to `"maximize"`, since larger F-statistic values indicate more important predictors. -``` -class_score -└─> class_score_aov (example shown) - └─> fit() -└─> class_score_cor - └─> fit() -└─> ... -``` +## Fitting (or estimating) feature score So far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. @@ -196,6 +193,15 @@ We now discuss the dual role of `fit()`: it functions both as a *generic* and as In other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. +``` +class_score +└─> class_score_aov (example shown) + └─> fit() +└─> class_score_cor + └─> fit() +└─> ... +``` + The ANOVA F-test filter, for example: ```{r} @@ -217,7 +223,7 @@ score_aov_fstat |> ## Defining S7 methods -To use the `fit()` method above, we need to define a S7 method that implements the scoring logic. +For users to use the `fit()` method described above, we need to define a S7 method that implements the scoring logic. The following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test: @@ -231,7 +237,7 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # TODO Finish the rest of the function using lm(), anova() + # This is where you include the rest of the function object@results <- res object @@ -242,7 +248,7 @@ We would want to do something similar to define a S7 method for other `class_sco ## Documenting S7 methods -Documentation for S7 methods is still a work in progress, and it seems no one currently knows the right approach. Here’s how we tackle it: +Documentation for S7 methods is still a work in progress, but our current best approach is as follows: - We re-export the `fit()` generic from generics. @@ -265,19 +271,16 @@ The code below opens the help page for specific `fit()` method: ?score_aov_fstat ``` -To enable the `?` help page above, the `fit()` method is exported so it can be called by the users, but it is not documented directly. +For users to access the help page using `?` as described above, the `fit()` method needs to be exported using `#' @export`, but it is not documented directly. ```{r} #| eval: false #' @export S7::method(fit, class_score_aov) <- function( object, - formula, - data, - case_weights = NULL, ... ) { - # TODO Finish the rest of the function using lm() and anova() + # Include the rest of the function here object@results <- res object From a652720446c06296db3c2583efd09088bc68b494 Mon Sep 17 00:00:00 2001 From: Frances Lin <37535633+franceslinyc@users.noreply.github.com> Date: Fri, 29 Aug 2025 13:59:59 -0700 Subject: [PATCH 21/21] READY! --- .../filtro/index/execute-results/html.json | 4 ++-- learn/develop/filtro/index.html.md | 24 ++++++++++--------- learn/develop/filtro/index.qmd | 24 ++++++++++--------- 3 files changed, 28 insertions(+), 24 deletions(-) diff --git a/_freeze/learn/develop/filtro/index/execute-results/html.json b/_freeze/learn/develop/filtro/index/execute-results/html.json index 5d4833d3..209513d0 100644 --- a/_freeze/learn/develop/filtro/index/execute-results/html.json +++ b/_freeze/learn/develop/filtro/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "fdfe5f16b0e0bf204e522365fe024e30", + "hash": "72caae83bac60ced40baca72547a173e", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nFor reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nThere is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own score class object.\n\nThe general procedure is to:\n\n1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method-specific properties, as opposed to the general characteristics already defined in the parent class. \n\n2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method .\n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAdditionally, we provide guidance on documenting an S7 method.\n\n## Parent class (General scoring object)\n\nAll the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `minimize`, `maximum`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation or [check it out here](https://github.com/tidymodels/filtro/blob/main/R/class_score.R). \n\n## Subclass (Custom scoring object)\n\nAll custom scoring objects implemented in filtro are subclasses of the parent class `class_score`, meaning that they all inherit the parent class's fixed properties. When creating a new scoring object, we do so by defining another subclass of this parent class.\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nWe demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`, as an example. \n\nFor reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all of the fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either the \n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nOnce instantiated, individual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\nThe F-statistic is not transformed, nor does it provide an option for transformation. Nevertheless, it also uses a fallback value of `Inf` with the direction set to `\"maximize\"`, since larger F-statistic values indicate more important predictors. \n\n## Fitting (or estimating) feature score\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nThe ANOVA F-test filter, for example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nFor users to use the `fit()` method described above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # This is where you include the rest of the function \n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar to define a S7 method for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, but our current best approach is as follows: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we document it in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object (instance) `score_*`. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nFor users to access the help page using `?` as described above, the `fit()` method needs to be exported using `#' @export`, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n ...\n) {\n # Include the rest of the function here\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-29\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.2.0 2025-08-26 CRAN (R 4.5.0)\n#> modeldata 1.5.1 2025-08-22 CRAN (R 4.5.0)\n#> purrr 1.1.0 2025-07-10 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.3.0 2025-06-08 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", + "markdown": "---\ntitle: \"Create your own score class object\"\ncategories:\n - developer tools\ntype: learn-subsection\nweight: 1\ndescription: | \n Create a new score class object for feature selection.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\n---\n\n\n\n\n\n## Introduction\n\nTo use code in this article, you will need to install the following packages: filtro and modeldata.\n\nCurrently, there are 6 filters in filtro and many existing score objects. A list of existing scoring objects [can be found here](https://filtro.tidymodels.org/articles/filtro.html#available-score-objects-and-filter-methods). However, you might need to define your own scoring objects. This article serves as a guide to creating new scoring objects and computing feature scores before performing ranking and selection. \n\nFor reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. \n\nRegarding scoring objects: \n\nThere is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own scoring class object.\n\nTherefore, the general procedure is to:\n\n1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method- or score-specific properties, as opposed to the general characteristics already defined in the parent class. \n\n2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method. \n\nThe hierarchy can be visualized as:\n\n```\nclass_score\n└─> class_score_* \n └─> fit()\n```\n\nAdditionally, we provide guidance on documenting an S7 method.\n\n## Parent class (General scoring object)\n\nAll the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented, and we need this to build our own scoring class object: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Call the parent class\nlibrary(filtro) \nclass_score\n```\n:::\n\n\nThese are the fixed properties (attributes) for this object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nargs(class_score)\n#> function (outcome_type = c(\"numeric\", \"factor\"), predictor_type = c(\"numeric\", \n#> \"factor\"), case_weights = logical(0), range = integer(0), inclusive = logical(0), \n#> fallback_value = integer(0), score_type = character(0), transform_fn = function() NULL, \n#> direction = character(0), deterministic = logical(0), tuning = logical(0), \n#> calculating_fn = function() NULL, label = character(0), packages = character(0), \n#> results = data.frame()) \n#> NULL\n```\n:::\n\n\nFor example: \n\n- `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`.\n\n- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0` or `Inf`.\n\n- `direction`: What direction of values indicates the most important values? For example, `minimize` or `maximum`.\n\n- `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame.\n\nFor details on its constructor and its remaining properties, please refer to the package documentation or [check it out here](https://github.com/tidymodels/filtro/blob/main/R/class_score.R). \n\n## Subclass (Custom scoring object)\n\nAll custom scoring objects implemented in filtro are subclasses of the parent class `class_score`, meaning that they all inherit the parent class's fixed properties. When creating a new scoring object, we do so by defining another subclass of this parent class.\n\n```\nclass_score\n└─> class_score_aov (example shown)\n└─> class_score_cor\n└─> ... \n```\n\nAs an example, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. \n\nFor reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. \n\nBy setting `parent = class_score`, the subclass `class_score_aov` inherits all of the fixed properties from the parent class. Additional, implementation-specific properties can be added using the `properties =` argument. For example:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Create a subclass named 'class_score_aov'\nclass_score_aov <- S7::new_class(\n \"class_score_aov\",\n parent = class_score,\n properties = list(\n neg_log10 = S7::new_property(S7::class_logical, default = TRUE)\n )\n)\n```\n:::\n\n\nIn addition to the properties inherited from the parent class (discussed in the previous section), `class_score_aov` also includes the following property:\n\n- `neg_log10`: Represent the score as `-log10(p_value)`? It is `TRUE` or `FALSE`.\n\nFor the ANOVA F-test filter, users can represent the score using either the \n\n- p-value or \n\n- F-statistic. \n\nWe demonstrate how to create these instances (objects) accordingly. \n\n`score_aov_pval` is created as an instance of the `class_score_aov` subclass by calling its constructor and specifying its properties:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA p-value\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nOnce instantiated, individual properties can be accessed via `object@`. For example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nscore_aov_pval@case_weights\n#> [1] TRUE\nscore_aov_pval@fallback_value\n#> [1] Inf\nscore_aov_pval@direction\n#> [1] \"maximize\"\n```\n:::\n\n\nNote that by default, the returned p-value is transformed to `-log10(p_value)`, which means larger values correspond to more important predictors. This is why the fallback value is set to `Inf` and the direction is set to `\"maximize\"`. \n\n`score_aov_fstat` is another instance of the `class_score_aov` subclass: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# ANOVA F-statistic\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\nThe F-statistic is not transformed, nor does it provide an option for transformation. Nevertheless, it also uses a fallback value of `Inf` with the direction set to `\"maximize\"`, since larger F-statistic values indicate more important predictors. \n\n## Fitting (or estimating) feature score\n\nSo far, we have covered how to construct a parent class, create a custom subclass, and instantiate objects for the ANOVA F-test filter. \n\nWe now discuss the dual role of `fit()`: it functions both as a *generic* and as the *methods* used to fit (or estimate) feature score. \n\n1. The `fit()` generic is re-exported from generics. It inspects the class of the object passed and dispatches to the appropriate method. \n\n2. We also define multiple methods named `fit()`. Each `fit()` method performs the actual fitting or score estimation for a specific class of object. \n\nIn other words, when `fit()` is called, the generic refers to the custom scoring object `class_score_*` to determine which method to dispatch. The actual scoring computation is performed within the dispatched method. \n\n```\nclass_score\n└─> class_score_aov (example shown)\n └─> fit()\n└─> class_score_cor\n └─> fit()\n└─> ... \n```\n\nLet’s use the ANOVA F-test filter again as an example: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Check the class of the object\nclass(score_aov_pval)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\nclass(score_aov_fstat)\n#> [1] \"filtro::class_score_aov\" \"filtro::class_score\" \n#> [3] \"S7_object\"\n```\n:::\n\n\nBoth instances (objects) belong to the custom scoring object `class_score_aov`. Therefore, when `fit()` is called, the method for `class_score_aov` is dispatched, performing the actual fitting using the ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Method dispatch for objects of class `class_score_aov`\nscore_aov_pval |>\n fit(Sale_Price ~ ., data = ames)\nscore_aov_fstat |>\n fit(Sale_Price ~ ., data = ames)\n```\n:::\n\n\n## Defining S7 methods \n\nFor users to use the `fit()` method described above, we need to define a S7 method that implements the scoring logic. \n\nThe following code defines the `fit()` method specifically for the `class_score_aov` subclass, specifying how feature score should be computed using ANOVA F-test:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Define the scoring method for `class_score_aov`\nS7::method(fit, class_score_aov) <- function(\n object,\n formula,\n data,\n case_weights = NULL,\n ...\n) {\n # This is where you add the rest of the code for this implementation \n\n object@results <- res\n object\n}\n```\n:::\n\n\nWe would want to do something similar to define a S7 method for other `class_score_*` subclass. \n\n## Documenting S7 methods \n\nDocumentation for S7 methods is still a work in progress, but our current best approach is as follows: \n\n- We re-export the `fit()` generic from generics. \n\n- Instead of documenting each `fit()` method, we document it in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the corresponding object (instance) `score_*`. \n\nThe code below opens the help page for the `fit()` generic: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` generic\n?fit\n```\n:::\n\n\nThe code below opens the help page for specific `fit()` method: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# User-level example: Help page for `fit()` method along with the documentation for the specific object\n?score_aov_pval\n?score_aov_fstat\n```\n:::\n\n\nFor users to access the help page using `?` as described above, the `fit()` method needs to be exported using `#' @export`, but it is not documented directly.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @export\nS7::method(fit, class_score_aov) <- function(\n object,\n ...\n) {\n # Include the rest of the function here\n\n object@results <- res\n object\n}\n```\n:::\n\n\nInstead, documentation is provided in the \"Details\" section and the \"Estimating the scores\" subsection of the documentation for the `score_aov_pval` object. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' Scoring via analysis of variance hypothesis tests\n#'\n#' @description\n#' \n#' @name score_aov_pval\n#' @family class score metrics\n#'\n#' @details\n#'\n#' These objects are used when either:\n#'\n#' ...\n#'\n#' ## Estimating the scores\n#'\n#' In \\pkg{filtro}, the `score_*` objects define a scoring method (e.g., data\n#' input requirements, package dependencies, etc). To compute the scores for\n#' a specific data set, the `fit()` method is used. The main arguments for\n#' these functions are:\n#'\n#' \\describe{\n#' \\item{`object`}{A score class object (e.g., `score_aov_pval`).}\n#' \\item{`formula`}{A standard R formula with a single outcome on the right-hand side and one or more predictors (or `.`) on the left-hand side. The data are processed via [stats::model.frame()]}\n#' \\item{`data`}{A data frame containing the relevant columns defined by the formula.}\n#' \\item{`...`}{Further arguments passed to or from other methods.}\n#' \\item{`case_weights`}{A quantitative vector of case weights that is the same length as the number of rows in `data`. The default of `NULL` indicates that there are no case weights.}\n#' }\n#'\n#' ...\n#' \n#' @export\nscore_aov_pval <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_pval\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA p-values\"\n )\n```\n:::\n\n\nWe can have the `score_aov_fstat` object share the same help page as `score_aov_pval` by using `#' @name`. This avoids repeated documentation for similar or related objects.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#' @name score_aov_pval\n#' @export\nscore_aov_fstat <-\n class_score_aov(\n outcome_type = c(\"numeric\", \"factor\"),\n predictor_type = c(\"numeric\", \"factor\"),\n case_weights = TRUE,\n range = c(0, Inf),\n inclusive = c(FALSE, FALSE),\n fallback_value = Inf,\n score_type = \"aov_fstat\",\n transform_fn = function(x) x,\n direction = \"maximize\",\n deterministic = TRUE,\n tuning = FALSE,\n label = \"ANOVA F-statistics\"\n )\n```\n:::\n\n\n## Accessing results after fitting\n\nOnce the method has been fitted via `fit()`, the data frame of results can be accessed via `object@results`. \n\nWe use a subset of the Ames data set from the {modeldata} package for demonstration. The goal is to predict housing sale price. `Sale_Price` is the outcome and is numeric. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(modeldata)\names_subset <- modeldata::ames |>\n # Use a subset of data for demonstration\n dplyr::select(\n Sale_Price,\n MS_SubClass,\n MS_Zoning,\n Lot_Frontage,\n Lot_Area,\n Street\n )\names_subset <- ames_subset |>\n dplyr::mutate(Sale_Price = log10(Sale_Price))\n```\n:::\n\n\nNext, we fit the score as we discuss before: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA p-value and fit score\names_aov_pval_res <-\n score_aov_pval |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Specify ANOVA F-statistic and fit score\names_aov_fstat_res <-\n score_aov_fstat |>\n fit(Sale_Price ~ ., data = ames_subset)\n```\n:::\n\n\nRecall that individual properties of an object can be accessed using `object@`. Once the method has been fitted, the resulting data frame can be accessed via `object@results`:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_pval_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_pval 237. Sale_Price MS_SubClass \n#> 2 aov_pval 130. Sale_Price MS_Zoning \n#> 3 aov_pval NA Sale_Price Lot_Frontage\n#> 4 aov_pval NA Sale_Price Lot_Area \n#> 5 aov_pval 5.75 Sale_Price Street\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\names_aov_fstat_res@results\n#> # A tibble: 5 × 4\n#> name score outcome predictor \n#> \n#> 1 aov_fstat 94.5 Sale_Price MS_SubClass \n#> 2 aov_fstat 115. Sale_Price MS_Zoning \n#> 3 aov_fstat NA Sale_Price Lot_Frontage\n#> 4 aov_fstat NA Sale_Price Lot_Area \n#> 5 aov_fstat 22.9 Sale_Price Street\n```\n:::\n\n\n## Session information {#session-info}\n\n\n::: {.cell layout-align=\"center\"}\n\n```\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#> filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#> intersect, setdiff, setequal, union\n#> ─ Session info ─────────────────────────────────────────────────────\n#> version R version 4.5.0 (2025-04-11)\n#> language (EN)\n#> date 2025-08-29\n#> pandoc 3.6.3\n#> quarto 1.7.32\n#> \n#> ─ Packages ─────────────────────────────────────────────────────────\n#> package version date (UTC) source\n#> dplyr 1.1.4 2023-11-17 CRAN (R 4.5.0)\n#> filtro 0.2.0 2025-08-26 CRAN (R 4.5.0)\n#> modeldata 1.5.1 2025-08-22 CRAN (R 4.5.0)\n#> purrr 1.1.0 2025-07-10 CRAN (R 4.5.0)\n#> rlang 1.1.6 2025-04-11 CRAN (R 4.5.0)\n#> tibble 3.3.0 2025-06-08 CRAN (R 4.5.0)\n#> \n#> ────────────────────────────────────────────────────────────────────\n```\n:::\n\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/learn/develop/filtro/index.html.md b/learn/develop/filtro/index.html.md index cc88d408..ea3dbb75 100644 --- a/learn/develop/filtro/index.html.md +++ b/learn/develop/filtro/index.html.md @@ -19,13 +19,15 @@ Currently, there are 6 filters in filtro and many existing score objects. A list For reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. -There is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own score class object. +Regarding scoring objects: -The general procedure is to: +There is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own scoring class object. -1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method-specific properties, as opposed to the general characteristics already defined in the parent class. +Therefore, the general procedure is to: -2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method . +1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method- or score-specific properties, as opposed to the general characteristics already defined in the parent class. + +2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method. The hierarchy can be visualized as: @@ -39,12 +41,12 @@ Additionally, we provide guidance on documenting an S7 method. ## Parent class (General scoring object) -All the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented: +All the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented, and we need this to build our own scoring class object: ::: {.cell layout-align="center"} ```{.r .cell-code} -# Create a parent class +# Call the parent class library(filtro) class_score ``` @@ -70,9 +72,9 @@ For example: - `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0` or `Inf`. -- `direction`: What direction of values indicates the most important values? For example, `minimize`, `maximum`. +- `direction`: What direction of values indicates the most important values? For example, `minimize` or `maximum`. - `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. @@ -89,7 +91,7 @@ class_score └─> ... ``` -We demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`, as an example. +As an example, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. For reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. @@ -206,7 +208,7 @@ class_score └─> ... ``` -The ANOVA F-test filter, for example: +Let’s use the ANOVA F-test filter again as an example: ::: {.cell layout-align="center"} @@ -251,7 +253,7 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # This is where you include the rest of the function + # This is where you add the rest of the code for this implementation object@results <- res object diff --git a/learn/develop/filtro/index.qmd b/learn/develop/filtro/index.qmd index 86f61906..a254879b 100644 --- a/learn/develop/filtro/index.qmd +++ b/learn/develop/filtro/index.qmd @@ -38,13 +38,15 @@ Currently, there are 6 filters in filtro and many existing score objects. A list For reference, filtro is tidy tools to apply filter-based supervised feature selection methods. It provides functions to rank and select a specified proportion or a fixed number of features using built-in methods and the desirability function. -There is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own score class object. +Regarding scoring objects: -The general procedure is to: +There is a parent class `class_score`, which defines the fixed properties that are shared across all subclasses. The parent class is already implemented, and serves as the infrastructure we build on when we make our own scoring class object. -1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method-specific properties, as opposed to the general characteristics already defined in the parent class. +Therefore, the general procedure is to: -2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method . +1. Create a subclass `class_score_*` that inherts from `class_score`. This subclass introduces additional, method- or score-specific properties, as opposed to the general characteristics already defined in the parent class. + +2. Implement the scoring method in `fit()`, which computes feature score. The `fit()` generic refers to the subclass from step 1 to use the appropriate `fit()` method. The hierarchy can be visualized as: @@ -58,11 +60,11 @@ Additionally, we provide guidance on documenting an S7 method. ## Parent class (General scoring object) -All the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented: +All the subclasses (custom scoring objects) share the same parent class named `class_score`. The parent class is already implemented, and we need this to build our own scoring class object: ```{r} #| eval: false -# Create a parent class +# Call the parent class library(filtro) class_score ``` @@ -78,9 +80,9 @@ For example: - `case_weights`: Does the method accpet case weights? It is `TRUE` or `FALSE`. -- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0`, `Inf`. +- `fallback_value`: What is a value that can be used for the statistic so that it will never be eliminated? For example, `0` or `Inf`. -- `direction`: What direction of values indicates the most important values? For example, `minimize`, `maximum`. +- `direction`: What direction of values indicates the most important values? For example, `minimize` or `maximum`. - `results`: A slot for the results once the method is fitted. Initially, this is an empty data frame. @@ -97,7 +99,7 @@ class_score └─> ... ``` -We demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`, as an example. +As an example, we demonstrate how to create a custom scoring object for ANOVA F-test named `class_score_aov`. For reference, the ANOVA F-test filter computes feature score using analysis of variance (ANOVA) hypothesis tests, powered by `lm()`. The `lm()` function fits a linear model and returns a summary containing the F-statistic and p-value, which can be used to evaluate feature importance. @@ -202,7 +204,7 @@ class_score └─> ... ``` -The ANOVA F-test filter, for example: +Let’s use the ANOVA F-test filter again as an example: ```{r} # User-level example: Check the class of the object @@ -237,7 +239,7 @@ S7::method(fit, class_score_aov) <- function( case_weights = NULL, ... ) { - # This is where you include the rest of the function + # This is where you add the rest of the code for this implementation object@results <- res object