Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,11 @@ Authors@R:
person(given = "Lona",
family = "Koers",
role = "ctb",
email = "[email protected]"))
email = "[email protected]"),
person(given = "Alexander",
family = "Winterstetter",
role = "ctb",
email = "[email protected]"))
Description: Dataflow programming toolkit that enriches 'mlr3' with a diverse
set of pipelining operators ('PipeOps') that can be composed into graphs.
Operations exist for data preprocessing, model fitting, and ensemble
Expand Down
4 changes: 3 additions & 1 deletion R/PipeOpImpute.R
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,9 @@ PipeOpImpute = R6Class("PipeOpImpute",
logical = c(TRUE, FALSE),
numeric = 0, # see PipeOpImputeMean and PipeOpImputeMedian
ordered = levels(feature), # see above
character = ""
character = "",
Date = as.Date(0),
POSIXct = as.POSIXct(0)
)
},

Expand Down
5 changes: 3 additions & 2 deletions R/PipeOpImputeConstant.R
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ PipeOpImputeConstant = R6Class("PipeOpImputeConstant",
check_levels = p_lgl(init = TRUE, tags = c("train", "required"))
)
super$initialize(id, param_set = ps, param_vals = param_vals, empty_level_control = "always",
feature_types = c("logical", "integer", "numeric", "character", "factor", "ordered", "POSIXct"))
feature_types = c("logical", "integer", "numeric", "character", "factor", "ordered", "POSIXct", "Date"))
}
),
private = list(
Expand All @@ -96,7 +96,8 @@ PipeOpImputeConstant = R6Class("PipeOpImputeConstant",
"character" = assert_string(constant),
"factor" = assert_string_or_factor(constant),
"ordered" = assert_string_or_factor(constant),
"POSIXct" = assert_posixct(constant, any.missing = FALSE, len = 1L)
"POSIXct" = assert_posixct(constant, any.missing = FALSE, len = 1L),
"Date" = assert_date(constant, any.missing = FALSE, len = 1L)
)
if (type %in% c("ordered", "factor") && self$param_set$values$check_levels) {
if (!isTRUE(check_choice(as.character(constant), levels(feature)))) {
Expand Down
24 changes: 19 additions & 5 deletions R/PipeOpImputeHist.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#' @title Impute Numerical Features by Histogram
#' @title Impute Numeric, Integer, POSIXct or Date Features by Histogram
#'
#' @usage NULL
#' @name mlr_pipeops_imputehist
#' @format [`R6Class`][R6::R6Class] object inheriting from [`PipeOpImpute`]/[`PipeOp`].
#'
#' @description
#' Impute numerical features by histogram.
#' Impute numeric, integer, POSIXct or Date features by histogram.
#'
#' During training, a histogram is fitted on each column using R's [`hist()`][graphics::hist] function.
#' The fitted histogram is then sampled from for imputation. Sampling happens in a two-step process:
Expand All @@ -27,7 +27,7 @@
#' @section Input and Output Channels:
#' Input and output channels are inherited from [`PipeOpImpute`].
#'
#' The output is the input [`Task`][mlr3::Task] with all affected numeric features missing values imputed by (column-wise) histogram; see Description for details.
#' The output is the input [`Task`][mlr3::Task] with all affected numeric, integer, POSIXct or Date features missing values imputed by (column-wise) histogram; see Description for details.
#'
#' @section State:
#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpImpute`].
Expand Down Expand Up @@ -66,13 +66,27 @@ PipeOpImputeHist = R6Class("PipeOpImputeHist",
inherit = PipeOpImpute,
public = list(
initialize = function(id = "imputehist", param_vals = list()) {
super$initialize(id, param_vals = param_vals, packages = "graphics", feature_types = c("integer", "numeric"))
super$initialize(id, param_vals = param_vals, packages = "graphics", feature_types = c("integer", "numeric", "Date", "POSIXct"))
}
),
private = list(

.train_imputer = function(feature, type, context) {
graphics::hist(feature, plot = FALSE)[c("counts", "breaks")]
if (inherits(feature, c("POSIXct", "Date"))) {
# hist() for POSIXct/Date does not do "Sturges" breaks automatically, so we compute it explicitly
n_breaks = ceiling(log2(length(feature)) + 1)
# If we pass the number of breaks, hist() does some computation that results in integer overflow
if (inherits(feature, "POSIXct")) {
breaks = as.POSIXct(as.numeric(pretty(range(feature, na.rm = TRUE), n = n_breaks, min.n = 1)))
} else {
breaks = as.Date(as.numeric(pretty(range(feature, na.rm = TRUE), n = n_breaks, min.n = 1)))
}
# pretty() does not return values of length < 2, so the special case where `breaks` gets
# intepreted differently does not need to be handled here.
graphics::hist(feature, breaks = breaks, plot = FALSE)[c("counts", "breaks")]
} else {
graphics::hist(feature, plot = FALSE)[c("counts", "breaks")]
}
},

.impute = function(feature, type, model, context) {
Expand Down
8 changes: 6 additions & 2 deletions R/PipeOpImputeLearner.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#' Note this parameter is part of the [`PipeOpImpute`] base class and explained there.
#'
#' Additionally, only features supported by the learner can be imputed; i.e. learners of type
#' `regr` can only impute features of type `integer` and `numeric`, while `classif` can impute
#' `regr` can only impute features of type `integer`, `numeric`, `POSIXct` and `Date`, while `classif` can impute
#' features of type `factor`, `ordered` and `logical`.
#'
#' The [`Learner`][mlr3::Learner] used for imputation is trained on all `context_columns`; if these contain missing values,
Expand Down Expand Up @@ -105,7 +105,7 @@ PipeOpImputeLearner = R6Class("PipeOpImputeLearner",
private$.learner = as_learner(learner, clone = TRUE)
id = id %??% private$.learner$id
feature_types = switch(private$.learner$task_type,
regr = c("integer", "numeric"),
regr = c("integer", "numeric", "POSIXct", "Date"),
classif = c("logical", "factor", "ordered"),
stop("Only `classif` or `regr` Learners are currently supported by PipeOpImputeLearner.")
# FIXME: at least ordinal should also be possible. When Moore's law catches up with us we could even do `character`
Expand Down Expand Up @@ -183,6 +183,8 @@ PipeOpImputeLearner = R6Class("PipeOpImputeLearner",
# Convert non-factor imputation targets to a factor
if (is.numeric(feature)) {
feature
} else if (any(class(feature) %in% c("POSIXct", "Date"))) {
as.numeric(feature)
} else {
if (!is.null(levels(feature))) {
factor(feature, levels = levels(feature), ordered = FALSE)
Expand All @@ -198,6 +200,8 @@ PipeOpImputeLearner = R6Class("PipeOpImputeLearner",
feature = round(feature)
}
if (type == "logical") feature = as.logical(feature) # FIXME mlr-org/mlr3#475
if (type == "POSIXct") feature = as.POSIXct(feature)
if (type == "Date") feature = as.Date(feature)
auto_convert(feature, "feature to be imputed", type, levels = levels(feature))
},
.additional_phash_input = function() private$.learner$phash
Expand Down
10 changes: 5 additions & 5 deletions R/PipeOpImputeMean.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#' @title Impute Numerical Features by their Mean
#' @title Impute Numeric, Integer, POSIXct or Date Features by their Mean
#'
#' @usage NULL
#' @name mlr_pipeops_imputemean
#' @format [`R6Class`][R6::R6Class] object inheriting from [`PipeOpImpute`]/[`PipeOp`].
#'
#' @description
#' Impute numerical features by their mean.
#' Impute numeric, integer, POSIXct or Date features by their mean.
#'
#' @section Construction:
#' ```
Expand All @@ -20,12 +20,12 @@
#' @section Input and Output Channels:
#' Input and output channels are inherited from [`PipeOpImpute`].
#'
#' The output is the input [`Task`][mlr3::Task] with all affected numeric features missing values imputed by (column-wise) mean.
#' The output is the input [`Task`][mlr3::Task] with all affected numeric, integer, POSIXct and Date features missing values imputed by (column-wise) mean.
#'
#' @section State:
#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpImpute`].
#'
#' The `$state$model` is a named `list` of `numeric(1)` indicating the mean of the respective feature.
#' The `$state$model` is a named `list` of either `numeric(1)`, `integer(1)`, `POSIXct(1)` or `Date(1)` indicating the mean of the respective feature.
#'
#' @section Parameters:
#' The parameters are the parameters inherited from [`PipeOpImpute`].
Expand Down Expand Up @@ -59,7 +59,7 @@ PipeOpImputeMean = R6Class("PipeOpImputeMean",
inherit = PipeOpImpute,
public = list(
initialize = function(id = "imputemean", param_vals = list()) {
super$initialize(id, param_vals = param_vals, feature_types= c("numeric", "integer"))
super$initialize(id, param_vals = param_vals, feature_types= c("numeric", "integer", "POSIXct", "Date"))
}
),
private = list(
Expand Down
10 changes: 5 additions & 5 deletions R/PipeOpImputeMedian.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#' @title Impute Numerical Features by their Median
#' @title Impute Numeric, Integer, POSIXct or Date Features by their Median
#'
#' @usage NULL
#' @name mlr_pipeops_imputemedian
#' @format [`R6Class`][R6::R6Class] object inheriting from [`PipeOpImpute`]/[`PipeOp`].
#'
#' @description
#' Impute numerical features by their median.
#' Impute numerical, integer, POSIXct or Date features by their median.
#'
#' @section Construction:
#' ```
Expand All @@ -20,12 +20,12 @@
#' @section Input and Output Channels:
#' Input and output channels are inherited from [`PipeOpImpute`].
#'
#' The output is the input [`Task`][mlr3::Task] with all affected numeric features missing values imputed by (column-wise) median.
#' The output is the input [`Task`][mlr3::Task] with all affected numeric, integer, POSIXct and Date features missing values imputed by (column-wise) median.
#'
#' @section State:
#' The `$state` is a named `list` with the `$state` elements inherited from [`PipeOpImpute`].
#'
#' The `$state$model` is a named `list` of `numeric(1)` indicating the median of the respective feature.
#' The `$state$model` is a named `list` of `numeric(1)`, `integer(1)`, `POSIXct(1)` or `Date(1)` indicating the median of the respective feature.
#'
#' @section Parameters:
#' The parameters are the parameters inherited from [`PipeOpImpute`].
Expand Down Expand Up @@ -59,7 +59,7 @@ PipeOpImputeMedian = R6Class("PipeOpImputeMedian",
inherit = PipeOpImpute,
public = list(
initialize = function(id = "imputemedian", param_vals = list()) {
super$initialize(id, param_vals = param_vals, packages = "stats", feature_types = c("numeric", "integer"))
super$initialize(id, param_vals = param_vals, packages = "stats", feature_types = c("numeric", "integer", "POSIXct", "Date"))
}
),
private = list(
Expand Down
4 changes: 2 additions & 2 deletions R/PipeOpImputeMode.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#' @format [`R6Class`][R6::R6Class] object inheriting from [`PipeOpImpute`]/[`PipeOp`].
#'
#' @description
#' Impute features by their mode. Supports factors as well as logical and numerical features.
#' Impute features by their mode. Supports factors, logical, numerical, POSIXct and Date features.
#' If multiple modes are present then imputed values are sampled randomly from them.
#'
#' @section Construction:
Expand Down Expand Up @@ -66,7 +66,7 @@ PipeOpImputeMode = R6Class("PipeOpImputeMode",
inherit = PipeOpImpute,
public = list(
initialize = function(id = "imputemode", param_vals = list()) {
super$initialize(id, param_vals = param_vals, feature_types = c("factor", "integer", "logical", "numeric", "ordered"))
super$initialize(id, param_vals = param_vals, feature_types = c("factor", "integer", "logical", "numeric", "ordered", "POSIXct", "Date"))
}
),
private = list(
Expand Down
17 changes: 10 additions & 7 deletions R/PipeOpImputeOOR.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#' @description
#' Impute factorial features by adding a new level `".MISSING"`.
#'
#' Impute numerical features by constant values shifted below the minimum or above the maximum by
#' Impute numeric, integer, POSIXct or Date features by constant values shifted below the minimum or above the maximum by
#' using \eqn{min(x) - offset - multiplier * diff(range(x))} or
#' \eqn{max(x) + offset + multiplier * diff(range(x))}.
#'
Expand Down Expand Up @@ -51,25 +51,25 @@
#'
#' The `$state$model` contains either `".MISSING"` used for `character` and `factor` (also
#' `ordered`) features or `numeric(1)` indicating the constant value used for imputation of
#' `integer` and `numeric` features.
#' `integer`, `numeric`, `POSIXct` or `Date` features.
#'
#' @section Parameters:
#' The parameters are the parameters inherited from [`PipeOpImpute`], as well as:
#' * `min` :: `logical(1)` \cr
#' Should `integer` and `numeric` features be shifted below the minimum? Initialized to `TRUE`. If `FALSE`
#' they are shifted above the maximum. See also the description above.
#' * `offset` :: `numeric(1)` \cr
#' Numerical non-negative offset as used in the description above for `integer` and `numeric`
#' Numerical non-negative offset as used in the description above for `integer`, `numeric`, `POSIXCT` and `Date`.
#' features. Initialized to `1`.
#' * `multiplier` :: `numeric(1)` \cr
#' Numerical non-negative multiplier as used in the description above for `integer` and `numeric`
#' Numerical non-negative multiplier as used in the description above for `integer`, `numeric`, `POSIXct` and `Date`.
#' features. Initialized to `1`.
#'
#' @section Internals:
#' Adds an explicit new `level()` to `factor` and `ordered` features, but not to `character` features.
#' For `integer` and `numeric` features uses the `min`, `max`, `diff` and `range` functions.
#' `integer` and `numeric` features that are entirely `NA` are imputed as `0`. `factor` and `ordered` features that are
#' entirely `NA` are imputed as `".MISSING"`.
#' entirely `NA` are imputed as `".MISSING"`. For `POSIXct` and `Date` features the value `0` is transformed into the respective data type.
#'
#' @section Fields:
#' Only fields inherited from [`PipeOp`].
Expand Down Expand Up @@ -119,7 +119,7 @@ PipeOpImputeOOR = R6Class("PipeOpImputeOOR",
)
# this is one of the few imputers that handles 'character' features!
super$initialize(id, param_set = ps, param_vals = param_vals, empty_level_control = "param",
feature_types = c("character", "factor", "integer", "numeric", "ordered"))
feature_types = c("character", "factor", "integer", "numeric", "ordered", "POSIXct", "Date"))
}
),
private = list(
Expand Down Expand Up @@ -153,10 +153,13 @@ PipeOpImputeOOR = R6Class("PipeOpImputeOOR",
logical = c(TRUE, FALSE),
numeric = 0,
ordered = ".MISSING",
character = ""
character = "",
POSIXct = as.POSIXct(0),
Date = as.Date(0)
)
}
)
)

mlr_pipeops$add("imputeoor", PipeOpImputeOOR)

6 changes: 3 additions & 3 deletions R/PipeOpImputeSample.R
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@
#' @section Internals:
#' Uses the `sample()` function. Features that are entirely `NA` are imputed as
#' the following: For `factor` or `ordered`, random levels are sampled uniformly at random.
#' For logicals, `TRUE` or `FALSE` are sampled uniformly at random.
#' Numerics and integers are imputed as `0`.
#' For `logical`, `TRUE` or `FALSE` are sampled uniformly at random.
#' `numeric` and `integer` are imputed as `0`.
#'
#' @section Fields:
#' Only fields inherited from [`PipeOp`].
Expand All @@ -61,7 +61,7 @@ PipeOpImputeSample = R6Class("PipeOpImputeSample",
inherit = PipeOpImpute,
public = list(
initialize = function(id = "imputesample", param_vals = list()) {
super$initialize(id, param_vals = param_vals, feature_types = c("factor", "integer", "logical", "numeric", "ordered"))
super$initialize(id, param_vals = param_vals, feature_types = c("factor", "integer", "logical", "numeric", "ordered", "POSIXct", "Date"))
}
),
private = list(
Expand Down
2 changes: 1 addition & 1 deletion R/PipeOpTaskPreproc.R
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
#' `"TaskRegr"` (or another subclass introduced by other packages). Default is `"Task"`.
#' * `tags` :: `character` | `NULL`\cr
#' Tags of the resulting `PipeOp`. This is added to the tag `"data transform"`. Default `NULL`.
#'* `feature_types` :: `character`\cr
#' * `feature_types` :: `character`\cr
#' Feature types affected by the `PipeOp`. See `private$.select_cols()` for more information.
#' Defaults to all available feature types.
#'
Expand Down
1 change: 1 addition & 0 deletions man/mlr3pipelines-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions man/mlr_pipeops_imputehist.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/mlr_pipeops_imputelearner.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions man/mlr_pipeops_imputemean.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading