Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
c29fe0d
PipeOpLearnerCV resampling ensemble prediction
Nov 4, 2025
7ce6f20
refactor
Nov 4, 2025
8acced0
Refine CV ensemble aggregation
Nov 4, 2025
d4f4e87
AGENTS
Nov 4, 2025
e156a6f
refactor
Nov 4, 2025
f396c38
CLAUDE.md
Nov 4, 2025
0cdc2f8
comment
mb706 Nov 4, 2025
4082c41
tests
mb706 Nov 4, 2025
eff004d
don't use weights any more
mb706 Nov 4, 2025
4f75cfd
less agressive marshalling
mb706 Nov 4, 2025
84a77bc
se aggregation for PipeOpRegrAvg
mb706 Nov 4, 2025
bded1d7
se aggregation
mb706 Nov 4, 2025
75a5958
fixes
mb706 Nov 4, 2025
95191fa
Add tests for PipeOpRegrAvg SE aggregation
mb706 Nov 4, 2025
eae89f5
Gate se_aggr_rho on se_aggr via depends
mb706 Nov 4, 2025
9ab7012
Add SE aggregation controls to PipeOpLearnerCV
mb706 Nov 4, 2025
3413fcd
small fixes
mb706 Nov 4, 2025
cce9f31
docs
mb706 Nov 4, 2025
b214f9b
se_aggr default
mb706 Nov 4, 2025
a26229a
regression tests
mb706 Nov 4, 2025
260e480
se aggregation options only when present
mb706 Nov 4, 2025
5869439
tests
mb706 Nov 4, 2025
efc7a8e
document
mb706 Nov 4, 2025
8a6a2c3
NEWS
mb706 Nov 4, 2025
e8b3127
rbuildignore
mb706 Nov 4, 2025
f85c775
another doc
mb706 Nov 4, 2025
8487c68
dict test
mb706 Nov 4, 2025
655332b
missing test coverage
mb706 Nov 4, 2025
b9d651e
doc
mb706 Nov 4, 2025
446bd3c
single SE prediction avg
mb706 Nov 4, 2025
1d815c1
prob averaging
mb706 Nov 4, 2025
0df72b2
Add log probability aggregation support and tests
mb706 Nov 4, 2025
8c812d9
agents
mb706 Nov 4, 2025
6585f51
cleaning up a bit
mb706 Nov 5, 2025
28c1add
agents
mb706 Nov 5, 2025
9fdf8dc
prompts
mb706 Nov 5, 2025
8597fe1
GraphLearner keeps its
mb706 Nov 5, 2025
f254a5d
0 prob log aggregation tests
mb706 Nov 5, 2025
b3c26be
Add revdep helper scripts
mb706 Nov 5, 2025
4c6824e
revdepchecks
mb706 Nov 6, 2025
5cf6638
Install reverse dependency test prerequisites
mb706 Nov 6, 2025
1846a6c
amendment
mb706 Nov 6, 2025
864a5e1
prepare directory structure
mb706 Nov 6, 2025
f88df1a
revdep check automation
mb706 Nov 6, 2025
c612020
don't check system clock in container
mb706 Nov 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,5 @@
^\.vscode$
^\.lintr$
^\.pre-commit-config\.yaml$
^AGENTS\.md$
^CLAUDE\.md$
104 changes: 104 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@

<persistence>
1. If the user asked you a question, try to gather information and answer the question to the best of your ability.
2. If the user asked you to review code, work and gather the required information to give a code review according to the `<guiding_principles>` and general best practices. Do not ask any more questions, just provide a best effort code review.
3. Otherwise:
- You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.
- If the instructions are unclear, try to think of what info you need and gather that info from the user *right away*, so you can then work autonomouslyf for many turns.
- Be extra-autonomous. The user wants you to work on your own, once you started.
- Only terminate your turn when you are sure that the problem is solved.
- Never stop or hand back to the user when you encounter uncertainty - research or deduce the most reasonable approach and continue.
- Do not ask the human to confirm or clarify assumptions except at the very beginning, as this can always be adjusted later - decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting
- You are working inside a secure container, you cannot break anything vital, so do not ask for permission and be bold.
</persistence>
<work_loop>
- At the beginning:
- When asked a question about the code or in general, or asked for code review, gather the necessary information and answer right away and finish.
- When instructions are unclear, ask clarifying questions at the beginning.
- During work:
- Think before you act. Plan ahead. Feel free to think more than you would otherwise; look at things from different angles, consider different scenarios.
- If possible, write a few tests *before* implementing a feature or fixing a bug.
- For a bug fix, write a test that captures the bug before fixing the bug.
- For a feature, create tests to the degree it is possible. Try really hard. If it is not possible, at least create test-stubs in the form of empty `test_that()` blocks to be filled in later.
- Tests should be sensibly thorough. Write more thorough tests only when asked by the user to write tests.
- Work and solve upcoming issues independently, using your best judgment
- Package progress into organic git commits. You may overwrite commits that are not on 'origin' yet, but do so only if it has great benefit. If you are on git branch `master`, create a new aptly named branch; never commit into `master`. Otherwise, do not leave the current git branch.
- Again: create git commits at organic points. In the past, you tended to make too few git commits.
- If any issues pop up:
- If you noticed any things that surprised you, anything that would have helped you substantially with your work if you had known it right away, add it to the `<agent_notes>` section of the `AGENTS.md` file. Future agents will then have access to this information. Use it to capture technical insights, failed approaches, user preferences, and other things future agents should know.
- After feature implementation, write tests:
- If you were asked to implement a feature and have not yet done so, fill in the test_that stubs created earlier or create new tests, to the degree that they make sense.
- If you were asked to fix a bug, check again that there are regression tests.
- When you are done:
- Write a short summary of what you did, and what decisions you had to make that went beyond what the user asked of you, and other things the user should know about, as chat response to the user.
- Unless you were working on something minor, or you are leaving things as an obvious work-in-progress, do a git commit.
</work_loop>
<debugging>
When fixing problems, always make sure you know the actual reason of the problem first:

1. Form hypotheses about what the issue could be.
2. Find a way to test these hypotheses and test them. If necessary, ask for assistance from the human, who e.g. may need to interact manually with the software
3. If you accept a hypothesis, apply an appropriate fix. The fix may not work and the hypothesis may turn out to be false; in that case, undo the fix unless it actually improves code quality overall. Do not leave unnecessary fixes for imaginary issues that never materialized clog up the code.
</debugging>
<guiding_principles>
Straightforwardness: Avoid ideological adherence to other programming principles when something can be solved in a simple, short, straightforward way. Otherwise:

- Simplicity: Favor small, focused components and avoid unnecessary complexity in design or logic.
- This also means: avoid overly defensive code. Observe the typical level of defensiveness when looking at the code.
- Idiomaticity: Solve problems the way they "should" be solved, in the respective language: the way a professional in that language would have approached it.
- Readability and maintainability are primary concerns, even at the cost of conciseness or performance.
- Doing it right is better than doing it fast. You are not in a rush. Never skip steps or take shortcuts.
- Tedious, systematic work is often the correct solution. Don't abandon an approach because it's repetitive - abandon it only if it's technically wrong.
- Honesty is a core value. Be honest about changes you have made and potential negative effects, these are okay. Be honest about shortcomings of other team members' plans and implementations, we all care more about the project than our egos. Be honest if you don't know something: say "I don't know" when appropriate.
</guiding_principles>
<project_info>

`mlr3pipelines` is a package that extends the `mlr3` ecosystem by adding preprocessing operations and a way to compose them into computational graphs.

- The package is very object-oriented; most things use R6.
- Coding style: we use `snake_case` for variables, `UpperCamelCase` for R6 classes. We use `=` for assignment and mostly use the tidyverse style guide otherwise. We use block-indent (two spaces), *not* visual indent; i.e., we don't align code with opening parentheses in function calls, we align by block depth.
- User-facing API (`@export`ed things, public R6 methods) always need checkmate `asserts_***()` argument checks. Otherwise don't be overly defensive, look at the other code in the project to see our esired level of paranoia.
- Always read at least `R/PipeOp.R` and `R/PipeOpTaskPreproc.R` to see the base classes you will need in almost every task.
- Read `R/Graph.R` and `R/GraphLearner.R` to understand the Graph architecture.
- Before you start coding, look at other relevant `.R` files that do something similar to what you are supposed to implement.
- We use `testthat`, and most test files are in `tests/testthat/`. Read the additional important helpers in `inst/testthat/helper_functions.R` to understand our `PipeOpTaskPreproc` auto-test framework.
- Always write tests, execute them with `devtools::test(filter = )` ; the entirety of our tests take a long time, so only run tests for what you just wrote.
- Tests involving the `$man` field, and tests involving parallelization, do not work well when the package is loaded with `devtools::load_all()`, because of conflicts with the installed version. Ignore these failures, CI will take care of this.
- The quality of our tests is lower than it ideally should be. We are in the process of improving this over time. Always leave the `tests/testthat/` folder in a better state than what you found it in!
- If `roxygenize()` / `document()` produce warnings that are unrelated to the code you wrote, ignore them. Do not fix code or formatting that is unrelated to what you are working on, but *do* mention bugs or problems that you noticed it in your final report.
- When you write examples, make sure they work.
- A very small number of packages listed in `Suggests:` used by some tests / examples is missing; ignore warnings in that regard. You will never be asked to work on things that require these packages.
- Packages that we rely on; they generally have good documentation thta can be queried, or they can be looked up on GitHub
- `mlr3`, provides `Task`, `Learner`, `Measure`, `Prediction`, various `***Result` classes; basically the foundation on which we build. <https://github.com/mlr-org/mlr3>
- `mlr3misc`, provides a lot of helper functions that we prefer to use over base-R when available. <https://github.com/mlr-org/mlr3misc>
- `paradox`, provides the hyperparameters-/configuration space: `ps()`, `p_int()`, `p_lgl()`, `p_fct()`, `p_uty()` etc. <https://github.com/mlr-org/paradox>
- For the mlr3-ecosystem as a whole, also consider the "mlr3 Book" as a reference, <https://mlr3book.mlr-org.com/>
- Semantics of paradox ParamSet parameters to pay attention to:
- there is a distinction between "default" values and values that a parameter is initialized to: a "default" is the behaviour that happens when the parameter is not given at all; e.g. PipeOpPCA `center` defaults to `TRUE`, since the underlying function (`prcomp`)'s does centering when the `center` argument is not given at all. In contrast, a parameter is "initialized" to some value if it is set to some value upon construction of a PipeOp. In rare cases, this can differ from default, e.g. if the underlying default behaviour is suboptimal for the use for preprocessing (e.g. it stores training data unnecessarily by default).
- a parameter can be marked as "required" by having the tag `"required"`. It is a special tag that causes an error if the value is not set. A "required" parameter *can not* have a "default", since semantically this is a contradiction: "default" would describe what happens when the param is not set, but param-not-set is an error.
- When we write preprocessing method ourselves we usually don't do "default" behaviour and instead mark most things as "required". "default" is mostly if we wrap some other library's function which itself has a function argument default value.
- We initialize a parameter by giving the `p_xxx(init = )` argument. Some old code does `param_set$values = list(...)` or `param_set$values$param = ...` in the constructor. This is deprecated; we do not unnecessarily change it in old code, but new code should have `init = `. A parameter should be documented as "initialized to" something if and only if the value is set through one of these methods in the constructor.
- Inside the train / predict functions of PipeOps, hyperparameter values should be obtained through `pv = self$param_set$get_values(tags = )`, where `tags` is often `"train"`, `"predict"`, or some custom tag that groups hyperparameters by meaning somehow (e.g. everything that should be passed to a specific function). A nice pattern is to call a function `fname` with many options configured through `pv` while also explicitly passing some arguments as `invoke(fname, arg1 = val1, arg2 = val2, .args = pv)`, using `invoke` from `mlr3misc`.
- paradox does type-checking and range-checking automatically; `get_values()` automatically checks that `"required"` params are present and not `NULL`. Therefore, we only do additional parameter feasibility checks in the rarest of cases.
- Minor things to be aware of:
- Errors that are thrown in PipeOps are automatically wrapped by Graph to also mention the PipeOp ID, so it is not necessary to include that in error messages.

</project_info>
<agent_notes>

# Notes by Agents to other Agents

- R unit tests in this repo assume helper `expect_man_exists()` is available. If you need to call it in a new test and you are working without mlr3pipelines installed, define a local fallback at the top of that test file before `expect_learner()` is used.
- Revdep helper scripts live in `attic/revdeps/`. `download_revdeps.R` downloads reverse dependency source tarballs; `install_revdep_suggests.R` installs Suggests for those revdeps without pulling the revdeps themselves.

</agent_notes>
<your_task>
Again, when implementing something, focus on:

1. Think things through and plan ahead.
2. Tests before implementation, if possible. In any case, write high quality tests, try to be better than the tests you find in this project.
3. Once you started, work independently; we can always undo things if necessary.
4. Create sensible intermediate commits.
5. Check your work, make sure tests pass. But do not run *all* tests, they take a long time.
6. Write a report to the user at the end, informing about decisoins that were made autonomously, unexpected issues etc.
</your_task>
1 change: 1 addition & 0 deletions CLAUDE.md
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Config/testthat/edition: 3
Config/testthat/parallel: true
NeedsCompilation: no
Roxygen: list(markdown = TRUE, r6 = FALSE)
RoxygenNote: 7.3.2
RoxygenNote: 7.3.3
VignetteBuilder: knitr, rmarkdown
Collate:
'CnfAtom.R'
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
* Fix: Added internal workaround for `PipeOpNMF` attaching `Biobase`, `BiocGenerics`, and `generics` to the search path during training, prediction or when printing its `$state`.
* feat: allow dates in datefeatures pipe op and use data.table for date feature generation.
* Added support for internal validation tasks to `PipeOpFeatureUnion`.
* feat: `PipeOpLearnerCV` can reuse the cross-validation models during prediction by averaging their outputs (`resampling.predict_method = "cv_ensemble"`).
* feat: `PipeOpRegrAvg` gets new `se_aggr` and `se_aggr_rho` hyperparameters and now allows various forms of SE aggregation.

# mlr3pipelines 0.9.0

Expand Down Expand Up @@ -304,4 +306,3 @@
# mlr3pipelines 0.1.0

* Initial upload to CRAN.

42 changes: 37 additions & 5 deletions R/PipeOpClassifAvg.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,23 @@
#' Always returns a `"prob"` prediction, regardless of the incoming [`Learner`][mlr3::Learner]'s
#' `$predict_type`. The label of the class with the highest predicted probability is selected as the
#' `"response"` prediction. If the [`Learner`][mlr3::Learner]'s `$predict_type` is set to `"prob"`,
#' the prediction obtained is also a `"prob"` type prediction with the probability predicted to be a
#' weighted average of incoming predictions.
#' the probability aggregation is controlled by `prob_aggr` (see below). If `$predict_type = "response"`,
#' predictions are internally converted to one-hot probability vectors (point mass on the predicted class) before aggregation.
#'
#' ### `"prob"` aggregation:
#'
#' * **`prob_aggr = "mean"`** -- *Linear opinion pool (arithmetic mean of probabilities; default)*.
#' **Interpretation.** Mixture semantics: choose a base model with probability `w[i]`, then draw from its class distribution.
#' Decision-theoretically, this is the minimizer of `sum(w[i] * KL(p[i] || p))` over probability vectors `p`, where `KL(x || y)` is the Kullback-Leibler divergence.
#' **Typical behavior.** Conservative / better calibrated and robust to near-zero probabilities (never assigns zero unless all do).
#' This is the standard choice for probability averaging in ensembles and stacking.
#'
#' * **`prob_aggr = "log"`** -- *Log opinion pool / product of experts (geometric mean in probability space)*:
#' Average per-model logs (or equivalently, logits) and apply softmax.
#' **Interpretation.** Product semantics: `p_ens ~ prod_i p_i^{w[i]}`; minimizes `sum(w[i] * KL(p || p[i]))`.
#' **Typical behavior.** Sharper / lower entropy (emphasizes consensus regions), but can be **overconfident** and is sensitive
#' to zeros; use `prob_aggr_eps` to clip small probabilities for numerical stability. Often beneficial with strong, similarly
#' calibrated members (e.g., neural networks), less so when calibration is the priority.
#'
#' All incoming [`Learner`][mlr3::Learner]'s `$predict_type` must agree.
#'
Expand Down Expand Up @@ -45,7 +60,14 @@
#' The `$state` is left empty (`list()`).
#'
#' @section Parameters:
#' The parameters are the parameters inherited from the [`PipeOpEnsemble`].
#' The parameters are the parameters inherited from the [`PipeOpEnsemble`], as well as:
#' * `prob_aggr` :: `character(1)`\cr
#' Controls how incoming class probabilities are aggregated. One of `"mean"` (linear opinion pool; default) or
#' `"log"` (log opinion pool / product of experts). See the description above for definitions and interpretation.
#' Only has an effect if the incoming predictions have `"prob"` values.
#' * `prob_aggr_eps` :: `numeric(1)`\cr
#' Small positive constant used only for `prob_aggr = "log"` to clamp probabilities before taking logs, improving numerical
#' stability and avoiding `-Inf`. Ignored for `prob_aggr = "mean"`. Default is `1e-12`.
#'
#' @section Internals:
#' Inherits from [`PipeOpEnsemble`] by implementing the `private$weighted_avg_predictions()` method.
Expand Down Expand Up @@ -81,7 +103,11 @@ PipeOpClassifAvg = R6Class("PipeOpClassifAvg",
inherit = PipeOpEnsemble,
public = list(
initialize = function(innum = 0, collect_multiplicity = FALSE, id = "classifavg", param_vals = list()) {
super$initialize(innum, collect_multiplicity, id, param_vals = param_vals, prediction_type = "PredictionClassif", packages = "stats")
param_set = ps(
prob_aggr = p_fct(levels = c("mean", "log"), init = "mean", tags = c("predict", "prob_aggr")),
prob_aggr_eps = p_dbl(lower = 0, upper = 1, default = 1e-12, tags = c("predict", "prob_aggr"), depends = quote(prob_aggr == "log"))
)
super$initialize(innum, collect_multiplicity, id, param_set = param_set, param_vals = param_vals, prediction_type = "PredictionClassif", packages = "stats")
}
),
private = list(
Expand All @@ -96,7 +122,13 @@ PipeOpClassifAvg = R6Class("PipeOpClassifAvg",

prob = NULL
if (every(inputs, function(x) !is.null(x$prob))) {
prob = weighted_matrix_sum(map(inputs, "prob"), weights)
pv = self$param_set$get_values(tags = "prob_aggr")
if (pv$prob_aggr == "mean") {
prob = weighted_matrix_sum(map(inputs, "prob"), weights)
} else { # prob_aggr == "log"
epsilon = pv$prob_aggr_eps %??% 1e-12
prob = weighted_matrix_logpool(map(inputs, "prob"), weights, epsilon = epsilon)
}
} else if (every(inputs, function(x) !is.null(x$response))) {
prob = weighted_factor_mean(map(inputs, "response"), weights, lvls)
} else {
Expand Down
19 changes: 19 additions & 0 deletions R/PipeOpEnsemble.R
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,25 @@ weighted_matrix_sum = function(matrices, weights) {
accmat
}

# Weighted log-opinion pool (geometric) aggregation of probability matrices
# Rows = samples, columns = classes. Each matrix must have the same shape.
# @param matrices list of matrices: per-learner probabilities
# @param weights numeric: weights, same length as `matrices` (assumed to sum to 1 upstream)
# @param epsilon numeric(1): small positive constant to clamp probabilities before log
# @return matrix: row-normalized aggregated probabilities (same shape as inputs)
weighted_matrix_logpool = function(matrices, weights, epsilon = 1e-12) {
assert_list(matrices, types = "matrix", min.len = 1)
assert_numeric(weights, len = length(matrices), any.missing = FALSE, finite = TRUE)
assert_number(epsilon, lower = 0, upper = 1)
acc = weights[1] * log(pmax(matrices[[1]], epsilon))
for (idx in seq_along(matrices)[-1]) {
acc = acc + weights[idx] * log(pmax(matrices[[idx]], epsilon))
}
P = exp(acc)
sweep(P, 1L, rowSums(P), "/")
}


# For a set of n `factor` vectors each of length l with the same k levels and a
# numeric weight vector of length n, returns a matrix of dimension l times k.
# Each cell contains the weighted relative frequency of the respective factor
Expand Down
Loading