mlr-org · mb706 · Nov 6, 2025 · Nov 4, 2025 · Nov 4, 2025 · Nov 4, 2025
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -25,3 +25,5 @@
 ^\.vscode$
 ^\.lintr$
 ^\.pre-commit-config\.yaml$
+^AGENTS\.md$
+^CLAUDE\.md$
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,104 @@
+
+<persistence>
+1. If the user asked you a question, try to gather information and answer the question to the best of your ability.
+2. If the user asked you to review code, work and gather the required information to give a code review according to the `<guiding_principles>` and general best practices. Do not ask any more questions, just provide a best effort code review.
+3. Otherwise:
+  - You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.
+  - If the instructions are unclear, try to think of what info you need and gather that info from the user *right away*, so you can then work autonomouslyf for many turns.
+  - Be extra-autonomous. The user wants you to work on your own, once you started.
+  - Only terminate your turn when you are sure that the problem is solved.
+  - Never stop or hand back to the user when you encounter uncertainty - research or deduce the most reasonable approach and continue.
+  - Do not ask the human to confirm or clarify assumptions except at the very beginning, as this can always be adjusted later - decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting
+  - You are working inside a secure container, you cannot break anything vital, so do not ask for permission and be bold.
+</persistence>
+<work_loop>
+- At the beginning:
+  - When asked a question about the code or in general, or asked for code review, gather the necessary information and answer right away and finish.
+  - When instructions are unclear, ask clarifying questions at the beginning.
+- During work:
+  - Think before you act. Plan ahead. Feel free to think more than you would otherwise; look at things from different angles, consider different scenarios.
+  - If possible, write a few tests *before* implementing a feature or fixing a bug.
+    - For a bug fix, write a test that captures the bug before fixing the bug.
+    - For a feature, create tests to the degree it is possible. Try really hard. If it is not possible, at least create test-stubs in the form of empty `test_that()` blocks to be filled in later.
+    - Tests should be sensibly thorough. Write more thorough tests only when asked by the user to write tests.
+  - Work and solve upcoming issues independently, using your best judgment
+  - Package progress into organic git commits. You may overwrite commits that are not on 'origin' yet, but do so only if it has great benefit. If you are on git branch `master`, create a new aptly named branch; never commit into `master`. Otherwise, do not leave the current git branch.
+  - Again: create git commits at organic points. In the past, you tended to make too few git commits.
+- If any issues pop up:
+  - If you noticed any things that surprised you, anything that would have helped you substantially with your work if you had known it right away, add it to the `<agent_notes>` section of the `AGENTS.md` file. Future agents will then have access to this information. Use it to capture technical insights, failed approaches, user preferences, and other things future agents should know.
+- After feature implementation, write tests:
+  - If you were asked to implement a feature and have not yet done so, fill in the test_that stubs created earlier or create new tests, to the degree that they make sense.
+  - If you were asked to fix a bug, check again that there are regression tests.
+- When you are done:
+  - Write a short summary of what you did, and what decisions you had to make that went beyond what the user asked of you, and other things the user should know about, as chat response to the user.
+  - Unless you were working on something minor, or you are leaving things as an obvious work-in-progress, do a git commit.
+</work_loop>
+<debugging>
+When fixing problems, always make sure you know the actual reason of the problem first:
+
+1. Form hypotheses about what the issue could be.
+2. Find a way to test these hypotheses and test them. If necessary, ask for assistance from the human, who e.g. may need to interact manually with the software
+3. If you accept a hypothesis, apply an appropriate fix. The fix may not work and the hypothesis may turn out to be false; in that case, undo the fix unless it actually improves code quality overall. Do not leave unnecessary fixes for imaginary issues that never materialized clog up the code.
+</debugging>
+<guiding_principles>
+Straightforwardness: Avoid ideological adherence to other programming principles when something can be solved in a simple, short, straightforward way. Otherwise:
+
+- Simplicity: Favor small, focused components and avoid unnecessary complexity in design or logic.
+- This also means: avoid overly defensive code. Observe the typical level of defensiveness when looking at the code.
+- Idiomaticity: Solve problems the way they "should" be solved, in the respective language: the way a professional in that language would have approached it.
+- Readability and maintainability are primary concerns, even at the cost of conciseness or performance.
+- Doing it right is better than doing it fast. You are not in a rush. Never skip steps or take shortcuts.
+- Tedious, systematic work is often the correct solution. Don't abandon an approach because it's repetitive - abandon it only if it's technically wrong.
+- Honesty is a core value. Be honest about changes you have made and potential negative effects, these are okay. Be honest about shortcomings of other team members' plans and implementations, we all care more about the project than our egos. Be honest if you don't know something: say "I don't know" when appropriate.
+</guiding_principles>
+<project_info>
+
+`mlr3pipelines` is a package that extends the `mlr3` ecosystem by adding preprocessing operations and a way to compose them into computational graphs.
+
+- The package is very object-oriented; most things use R6.
+- Coding style: we use `snake_case` for variables, `UpperCamelCase` for R6 classes. We use `=` for assignment and mostly use the tidyverse style guide otherwise. We use block-indent (two spaces), *not* visual indent; i.e., we don't align code with opening parentheses in function calls, we align by block depth.
+- User-facing API (`@export`ed things, public R6 methods) always need checkmate `asserts_***()` argument checks. Otherwise don't be overly defensive, look at the other code in the project to see our esired level of paranoia.
+- Always read at least `R/PipeOp.R` and `R/PipeOpTaskPreproc.R` to see the base classes you will need in almost every task.
+- Read `R/Graph.R` and `R/GraphLearner.R` to understand the Graph architecture.
+- Before you start coding, look at other relevant `.R` files that do something similar to what you are supposed to implement.
+- We use `testthat`, and most test files are in `tests/testthat/`. Read the additional important helpers in `inst/testthat/helper_functions.R` to understand our `PipeOpTaskPreproc` auto-test framework.
+- Always write tests, execute them with `devtools::test(filter = )` ; the entirety of our tests take a long time, so only run tests for what you just wrote.
+- Tests involving the `$man` field, and tests involving parallelization, do not work well when the package is loaded with `devtools::load_all()`, because of conflicts with the installed version. Ignore these failures, CI will take care of this.
+- The quality of our tests is lower than it ideally should be. We are in the process of improving this over time. Always leave the `tests/testthat/` folder in a better state than what you found it in!
+- If `roxygenize()` / `document()` produce warnings that are unrelated to the code you wrote, ignore them. Do not fix code or formatting that is unrelated to what you are working on, but *do* mention bugs or problems that you noticed it in your final report.
+- When you write examples, make sure they work.
+- A very small number of packages listed in `Suggests:` used by some tests / examples is missing; ignore warnings in that regard. You will never be asked to work on things that require these packages.
+- Packages that we rely on; they generally have good documentation thta can be queried, or they can be looked up on GitHub
+  - `mlr3`, provides `Task`, `Learner`, `Measure`, `Prediction`, various `***Result` classes; basically the foundation on which we build. <https://github.com/mlr-org/mlr3>
+  - `mlr3misc`, provides a lot of helper functions that we prefer to use over base-R when available. <https://github.com/mlr-org/mlr3misc>
+  - `paradox`, provides the hyperparameters-/configuration space: `ps()`, `p_int()`, `p_lgl()`, `p_fct()`, `p_uty()` etc. <https://github.com/mlr-org/paradox>
+  - For the mlr3-ecosystem as a whole, also consider the "mlr3 Book" as a reference, <https://mlr3book.mlr-org.com/>
+- Semantics of paradox ParamSet parameters to pay attention to:
+  - there is a distinction between "default" values and values that a parameter is initialized to: a "default" is the behaviour that happens when the parameter is not given at all; e.g. PipeOpPCA `center` defaults to `TRUE`, since the underlying function (`prcomp`)'s does centering when the `center` argument is not given at all. In contrast, a parameter is "initialized" to some value if it is set to some value upon construction of a PipeOp. In rare cases, this can differ from default, e.g. if the underlying default behaviour is suboptimal for the use for preprocessing (e.g. it stores training data unnecessarily by default).
+  - a parameter can be marked as "required" by having the tag `"required"`. It is a special tag that causes an error if the value is not set. A "required" parameter *can not* have a "default", since semantically this is a contradiction: "default" would describe what happens when the param is not set, but param-not-set is an error.
+  - When we write preprocessing method ourselves we usually don't do "default" behaviour and instead mark most things as "required". "default" is mostly if we wrap some other library's function which itself has a function argument default value.
+  - We initialize a parameter by giving the `p_xxx(init = )` argument. Some old code does `param_set$values = list(...)` or `param_set$values$param = ...` in the constructor. This is deprecated; we do not unnecessarily change it in old code, but new code should have `init = `. A parameter should be documented as "initialized to" something if and only if the value is set through one of these methods in the constructor.
+  - Inside the train / predict functions of PipeOps, hyperparameter values should be obtained through `pv = self$param_set$get_values(tags = )`, where `tags` is often `"train"`, `"predict"`, or some custom tag that groups hyperparameters by meaning somehow (e.g. everything that should be passed to a specific function). A nice pattern is to call a function `fname` with many options configured through `pv` while also explicitly passing some arguments as `invoke(fname, arg1 = val1, arg2 = val2, .args = pv)`, using `invoke` from `mlr3misc`.
+  - paradox does type-checking and range-checking automatically; `get_values()` automatically checks that `"required"` params are present and not `NULL`. Therefore, we only do additional parameter feasibility checks in the rarest of cases.
+- Minor things to be aware of:
+  - Errors that are thrown in PipeOps are automatically wrapped by Graph to also mention the PipeOp ID, so it is not necessary to include that in error messages.
+
+</project_info>
+<agent_notes>
+
+# Notes by Agents to other Agents
+
+- R unit tests in this repo assume helper `expect_man_exists()` is available. If you need to call it in a new test and you are working without mlr3pipelines installed, define a local fallback at the top of that test file before `expect_learner()` is used.
+- Revdep helper scripts live in `attic/revdeps/`. `download_revdeps.R` downloads reverse dependency source tarballs; `install_revdep_suggests.R` installs Suggests for those revdeps without pulling the revdeps themselves.
+
+</agent_notes>
+<your_task>
+Again, when implementing something, focus on:
+
+1. Think things through and plan ahead.
+2. Tests before implementation, if possible. In any case, write high quality tests, try to be better than the tests you find in this project.
+3. Once you started, work independently; we can always undo things if necessary.
+4. Create sensible intermediate commits.
+5. Check your work, make sure tests pass. But do not run *all* tests, they take a long time.
+6. Write a report to the user at the end, informing about decisoins that were made autonomously, unexpected issues etc.
+</your_task>
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1 @@
+AGENTS.md
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -107,7 +107,7 @@ Config/testthat/edition: 3
 Config/testthat/parallel: true
 NeedsCompilation: no
 Roxygen: list(markdown = TRUE, r6 = FALSE)
-RoxygenNote: 7.3.2
+RoxygenNote: 7.3.3
 VignetteBuilder: knitr, rmarkdown
 Collate:
     'CnfAtom.R'

diff --git a/NEWS.md b/NEWS.md
@@ -4,6 +4,8 @@
 * Fix: Added internal workaround for `PipeOpNMF` attaching `Biobase`, `BiocGenerics`, and `generics` to the search path during training, prediction or when printing its `$state`.
 * feat: allow dates in datefeatures pipe op and use data.table for date feature generation.
 * Added support for internal validation tasks to `PipeOpFeatureUnion`.
+* feat: `PipeOpLearnerCV` can reuse the cross-validation models during prediction by averaging their outputs (`resampling.predict_method = "cv_ensemble"`).
+* feat: `PipeOpRegrAvg` gets new `se_aggr` and `se_aggr_rho` hyperparameters and now allows various forms of SE aggregation.
 
 # mlr3pipelines 0.9.0
 
@@ -304,4 +306,3 @@
 # mlr3pipelines 0.1.0
 
 * Initial upload to CRAN.
-
diff --git a/R/PipeOpClassifAvg.R b/R/PipeOpClassifAvg.R
@@ -11,8 +11,23 @@
 #' Always returns a `"prob"` prediction, regardless of the incoming [`Learner`][mlr3::Learner]'s
 #' `$predict_type`. The label of the class with the highest predicted probability is selected as the
 #' `"response"` prediction. If the [`Learner`][mlr3::Learner]'s `$predict_type` is set to `"prob"`,
-#' the prediction obtained is also a `"prob"` type prediction with the probability predicted to be a
-#' weighted average of incoming predictions.
+#' the probability aggregation is controlled by `prob_aggr` (see below). If `$predict_type = "response"`,
+#' predictions are internally converted to one-hot probability vectors (point mass on the predicted class) before aggregation.
+#'
+#' ### `"prob"` aggregation:
+#'
+#' * **`prob_aggr = "mean"`** -- *Linear opinion pool (arithmetic mean of probabilities; default)*.
+#'   **Interpretation.** Mixture semantics: choose a base model with probability `w[i]`, then draw from its class distribution.
+#'   Decision-theoretically, this is the minimizer of `sum(w[i] * KL(p[i] || p))` over probability vectors `p`, where `KL(x || y)` is the Kullback-Leibler divergence.
+#'   **Typical behavior.** Conservative / better calibrated and robust to near-zero probabilities (never assigns zero unless all do).
+#'   This is the standard choice for probability averaging in ensembles and stacking.
+#'
+#' * **`prob_aggr = "log"`** -- *Log opinion pool / product of experts (geometric mean in probability space)*:
+#'   Average per-model logs (or equivalently, logits) and apply softmax.
+#'   **Interpretation.** Product semantics: `p_ens ~ prod_i p_i^{w[i]}`; minimizes `sum(w[i] * KL(p || p[i]))`.
+#'   **Typical behavior.** Sharper / lower entropy (emphasizes consensus regions), but can be **overconfident** and is sensitive
+#'   to zeros; use `prob_aggr_eps` to clip small probabilities for numerical stability. Often beneficial with strong, similarly
+#'   calibrated members (e.g., neural networks), less so when calibration is the priority.
 #'
 #' All incoming [`Learner`][mlr3::Learner]'s `$predict_type` must agree.
 #'
@@ -45,7 +60,14 @@
 #' The `$state` is left empty (`list()`).
 #'
 #' @section Parameters:
-#' The parameters are the parameters inherited from the [`PipeOpEnsemble`].
+#' The parameters are the parameters inherited from the [`PipeOpEnsemble`], as well as:
+#' * `prob_aggr` :: `character(1)`\cr
+#'   Controls how incoming class probabilities are aggregated. One of `"mean"` (linear opinion pool; default) or
+#'   `"log"` (log opinion pool / product of experts). See the description above for definitions and interpretation.
+#'   Only has an effect if the incoming predictions have `"prob"` values.
+#' * `prob_aggr_eps` :: `numeric(1)`\cr
+#'   Small positive constant used only for `prob_aggr = "log"` to clamp probabilities before taking logs, improving numerical
+#'   stability and avoiding `-Inf`. Ignored for `prob_aggr = "mean"`. Default is `1e-12`.
 #'
 #' @section Internals:
 #' Inherits from [`PipeOpEnsemble`] by implementing the `private$weighted_avg_predictions()` method.
@@ -81,7 +103,11 @@ PipeOpClassifAvg = R6Class("PipeOpClassifAvg",
   inherit = PipeOpEnsemble,
   public = list(
     initialize = function(innum = 0, collect_multiplicity = FALSE, id = "classifavg", param_vals = list()) {
-      super$initialize(innum, collect_multiplicity, id, param_vals = param_vals, prediction_type = "PredictionClassif", packages = "stats")
+      param_set = ps(
+        prob_aggr = p_fct(levels = c("mean", "log"), init = "mean", tags = c("predict", "prob_aggr")),
+        prob_aggr_eps = p_dbl(lower = 0, upper = 1, default = 1e-12, tags = c("predict", "prob_aggr"), depends = quote(prob_aggr == "log"))
+      )
+      super$initialize(innum, collect_multiplicity, id, param_set = param_set, param_vals = param_vals, prediction_type = "PredictionClassif", packages = "stats")
     }
   ),
   private = list(
@@ -96,7 +122,13 @@ PipeOpClassifAvg = R6Class("PipeOpClassifAvg",
 
       prob = NULL
       if (every(inputs, function(x) !is.null(x$prob))) {
-        prob = weighted_matrix_sum(map(inputs, "prob"), weights)
+        pv = self$param_set$get_values(tags = "prob_aggr")
+        if (pv$prob_aggr == "mean") {
+          prob = weighted_matrix_sum(map(inputs, "prob"), weights)
+        } else {  # prob_aggr == "log"
+          epsilon = pv$prob_aggr_eps %??% 1e-12
+          prob = weighted_matrix_logpool(map(inputs, "prob"), weights, epsilon = epsilon)
+        }
       } else if (every(inputs, function(x) !is.null(x$response))) {
         prob = weighted_factor_mean(map(inputs, "response"), weights, lvls)
       } else {

diff --git a/R/PipeOpEnsemble.R b/R/PipeOpEnsemble.R
@@ -178,6 +178,25 @@ weighted_matrix_sum = function(matrices, weights) {
   accmat
 }
 
+# Weighted log-opinion pool (geometric) aggregation of probability matrices
+# Rows = samples, columns = classes. Each matrix must have the same shape.
+# @param matrices list of matrices: per-learner probabilities
+# @param weights numeric: weights, same length as `matrices` (assumed to sum to 1 upstream)
+# @param epsilon numeric(1): small positive constant to clamp probabilities before log
+# @return matrix: row-normalized aggregated probabilities (same shape as inputs)
+weighted_matrix_logpool = function(matrices, weights, epsilon = 1e-12) {
+  assert_list(matrices, types = "matrix", min.len = 1)
+  assert_numeric(weights, len = length(matrices), any.missing = FALSE, finite = TRUE)
+  assert_number(epsilon, lower = 0, upper = 1)
+  acc = weights[1] * log(pmax(matrices[[1]], epsilon))
+  for (idx in seq_along(matrices)[-1]) {
+    acc = acc + weights[idx] * log(pmax(matrices[[idx]], epsilon))
+  }
+  P = exp(acc)
+  sweep(P, 1L, rowSums(P), "/")
+}
+
+
 # For a set of n `factor` vectors each of length l with the same k levels and a
 # numeric weight vector of length n, returns a matrix of dimension l times k.
 # Each cell contains the weighted relative frequency of the respective factor