Skip to content

Commit 74b57b4

Browse files
authored
Merge branch 'master' into cli
2 parents b0df928 + 617a995 commit 74b57b4

File tree

262 files changed

+8637
-1742
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

262 files changed

+8637
-1742
lines changed

.Rbuildignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,5 @@
2525
^\.vscode$
2626
^\.lintr$
2727
^\.pre-commit-config\.yaml$
28+
^AGENTS\.md$
29+
^CLAUDE\.md$

.github/workflows/dev-cmd-check.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,10 @@ jobs:
2424
fail-fast: false
2525
matrix:
2626
config:
27-
- {os: ubuntu-latest, r: 'release', dev-package: "mlr-org/bbotk', 'mlr-org/mlr3learners', 'mlr-org/paradox"}
27+
- {os: ubuntu-latest, r: 'release', dev-package: "mlr-org/mlr3', 'mlr-org/bbotk', 'mlr-org/mlr3learners', 'mlr-org/paradox"}
2828

2929
steps:
30-
- uses: actions/checkout@v4
30+
- uses: actions/checkout@v5
3131

3232
- uses: r-lib/actions/setup-r@v2
3333
with:

.github/workflows/pkgdown.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
env:
2424
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
2525
steps:
26-
- uses: actions/checkout@v4
26+
- uses: actions/checkout@v5
2727

2828
- uses: r-lib/actions/setup-pandoc@v2
2929

@@ -44,7 +44,7 @@ jobs:
4444

4545
- name: Deploy
4646
if: github.event_name != 'pull_request'
47-
uses: JamesIves/github-pages-deploy-action@v4.6.9
47+
uses: JamesIves/github-pages-deploy-action@v4.7.3
4848
with:
4949
clean: false
5050
branch: gh-pages

.github/workflows/r-cmd-check.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
r-cmd-check:
1616
runs-on: ${{ matrix.config.os }}
1717

18-
name: ${{ matrix.config.os }} (${{ matrix.config.r }})
18+
name: ${{ matrix.config.os }} (${{ matrix.config.r }})${{ matrix.config.depends_only && ' – noSuggests' || '' }}
1919

2020
env:
2121
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
@@ -26,9 +26,10 @@ jobs:
2626
config:
2727
- {os: ubuntu-latest, r: 'devel'}
2828
- {os: ubuntu-latest, r: 'release'}
29+
- {os: ubuntu-latest, r: 'release', depends_only: true}
2930

3031
steps:
31-
- uses: actions/checkout@v4
32+
- uses: actions/checkout@v5
3233

3334
- uses: r-lib/actions/setup-r@v2
3435
with:
@@ -40,3 +41,6 @@ jobs:
4041
needs: check
4142

4243
- uses: r-lib/actions/check-r-package@v2
44+
env:
45+
_R_CHECK_DEPENDS_ONLY_: ${{ matrix.config.depends_only && 'TRUE' || 'FALSE' }}
46+
NOT_CRAN: ${{ matrix.config.depends_only && 'FALSE' || 'TRUE' }}

AGENTS.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
2+
<persistence>
3+
1. If the user asked you a question, try to gather information and answer the question to the best of your ability.
4+
2. If the user asked you to review code, work and gather the required information to give a code review according to the `<guiding_principles>` and general best practices. Do not ask any more questions, just provide a best effort code review.
5+
3. Otherwise:
6+
- You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user.
7+
- If the instructions are unclear, try to think of what info you need and gather that info from the user *right away*, so you can then work autonomouslyf for many turns.
8+
- Be extra-autonomous. The user wants you to work on your own, once you started.
9+
- Only terminate your turn when you are sure that the problem is solved.
10+
- Never stop or hand back to the user when you encounter uncertainty - research or deduce the most reasonable approach and continue.
11+
- Do not ask the human to confirm or clarify assumptions except at the very beginning, as this can always be adjusted later - decide what the most reasonable assumption is, proceed with it, and document it for the user's reference after you finish acting
12+
- You are working inside a secure container, you cannot break anything vital, so do not ask for permission and be bold.
13+
</persistence>
14+
<work_loop>
15+
- At the beginning:
16+
- When asked a question about the code or in general, or asked for code review, gather the necessary information and answer right away and finish.
17+
- When instructions are unclear, ask clarifying questions at the beginning.
18+
- During work:
19+
- Think before you act. Plan ahead. Feel free to think more than you would otherwise; look at things from different angles, consider different scenarios.
20+
- If possible, write a few tests *before* implementing a feature or fixing a bug.
21+
- For a bug fix, write a test that captures the bug before fixing the bug.
22+
- For a feature, create tests to the degree it is possible. Try really hard. If it is not possible, at least create test-stubs in the form of empty `test_that()` blocks to be filled in later.
23+
- Tests should be sensibly thorough. Write more thorough tests only when asked by the user to write tests.
24+
- Work and solve upcoming issues independently, using your best judgment
25+
- Package progress into organic git commits. You may overwrite commits that are not on 'origin' yet, but do so only if it has great benefit. If you are on git branch `master`, create a new aptly named branch; never commit into `master`. Otherwise, do not leave the current git branch.
26+
- Again: create git commits at organic points. In the past, you tended to make too few git commits.
27+
- If any issues pop up:
28+
- If you noticed any things that surprised you, anything that would have helped you substantially with your work if you had known it right away, add it to the `<agent_notes>` section of the `AGENTS.md` file. Future agents will then have access to this information. Use it to capture technical insights, failed approaches, user preferences, and other things future agents should know.
29+
- After feature implementation, write tests:
30+
- If you were asked to implement a feature and have not yet done so, fill in the test_that stubs created earlier or create new tests, to the degree that they make sense.
31+
- If you were asked to fix a bug, check again that there are regression tests.
32+
- When you are done:
33+
- Write a short summary of what you did, and what decisions you had to make that went beyond what the user asked of you, and other things the user should know about, as chat response to the user.
34+
- Unless you were working on something minor, or you are leaving things as an obvious work-in-progress, do a git commit.
35+
</work_loop>
36+
<debugging>
37+
When fixing problems, always make sure you know the actual reason of the problem first:
38+
39+
1. Form hypotheses about what the issue could be.
40+
2. Find a way to test these hypotheses and test them. If necessary, ask for assistance from the human, who e.g. may need to interact manually with the software
41+
3. If you accept a hypothesis, apply an appropriate fix. The fix may not work and the hypothesis may turn out to be false; in that case, undo the fix unless it actually improves code quality overall. Do not leave unnecessary fixes for imaginary issues that never materialized clog up the code.
42+
</debugging>
43+
<guiding_principles>
44+
Straightforwardness: Avoid ideological adherence to other programming principles when something can be solved in a simple, short, straightforward way. Otherwise:
45+
46+
- Simplicity: Favor small, focused components and avoid unnecessary complexity in design or logic.
47+
- This also means: avoid overly defensive code. Observe the typical level of defensiveness when looking at the code.
48+
- Idiomaticity: Solve problems the way they "should" be solved, in the respective language: the way a professional in that language would have approached it.
49+
- Readability and maintainability are primary concerns, even at the cost of conciseness or performance.
50+
- Doing it right is better than doing it fast. You are not in a rush. Never skip steps or take shortcuts.
51+
- Tedious, systematic work is often the correct solution. Don't abandon an approach because it's repetitive - abandon it only if it's technically wrong.
52+
- Honesty is a core value. Be honest about changes you have made and potential negative effects, these are okay. Be honest about shortcomings of other team members' plans and implementations, we all care more about the project than our egos. Be honest if you don't know something: say "I don't know" when appropriate.
53+
</guiding_principles>
54+
<project_info>
55+
56+
`mlr3pipelines` is a package that extends the `mlr3` ecosystem by adding preprocessing operations and a way to compose them into computational graphs.
57+
58+
- The package is very object-oriented; most things use R6.
59+
- Coding style: we use `snake_case` for variables, `UpperCamelCase` for R6 classes. We use `=` for assignment and mostly use the tidyverse style guide otherwise. We use block-indent (two spaces), *not* visual indent; i.e., we don't align code with opening parentheses in function calls, we align by block depth.
60+
- User-facing API (`@export`ed things, public R6 methods) always need checkmate `asserts_***()` argument checks. Otherwise don't be overly defensive, look at the other code in the project to see our esired level of paranoia.
61+
- Always read at least `R/PipeOp.R` and `R/PipeOpTaskPreproc.R` to see the base classes you will need in almost every task.
62+
- Read `R/Graph.R` and `R/GraphLearner.R` to understand the Graph architecture.
63+
- Before you start coding, look at other relevant `.R` files that do something similar to what you are supposed to implement.
64+
- We use `testthat`, and most test files are in `tests/testthat/`. Read the additional important helpers in `inst/testthat/helper_functions.R` to understand our `PipeOpTaskPreproc` auto-test framework.
65+
- Always write tests, execute them with `devtools::test(filter = )` ; the entirety of our tests take a long time, so only run tests for what you just wrote.
66+
- Tests involving the `$man` field, and tests involving parallelization, do not work well when the package is loaded with `devtools::load_all()`, because of conflicts with the installed version. Ignore these failures, CI will take care of this.
67+
- The quality of our tests is lower than it ideally should be. We are in the process of improving this over time. Always leave the `tests/testthat/` folder in a better state than what you found it in!
68+
- If `roxygenize()` / `document()` produce warnings that are unrelated to the code you wrote, ignore them. Do not fix code or formatting that is unrelated to what you are working on, but *do* mention bugs or problems that you noticed it in your final report.
69+
- When you write examples, make sure they work.
70+
- A very small number of packages listed in `Suggests:` used by some tests / examples is missing; ignore warnings in that regard. You will never be asked to work on things that require these packages.
71+
- Packages that we rely on; they generally have good documentation thta can be queried, or they can be looked up on GitHub
72+
- `mlr3`, provides `Task`, `Learner`, `Measure`, `Prediction`, various `***Result` classes; basically the foundation on which we build. <https://github.com/mlr-org/mlr3>
73+
- `mlr3misc`, provides a lot of helper functions that we prefer to use over base-R when available. <https://github.com/mlr-org/mlr3misc>
74+
- `paradox`, provides the hyperparameters-/configuration space: `ps()`, `p_int()`, `p_lgl()`, `p_fct()`, `p_uty()` etc. <https://github.com/mlr-org/paradox>
75+
- For the mlr3-ecosystem as a whole, also consider the "mlr3 Book" as a reference, <https://mlr3book.mlr-org.com/>
76+
- Semantics of paradox ParamSet parameters to pay attention to:
77+
- there is a distinction between "default" values and values that a parameter is initialized to: a "default" is the behaviour that happens when the parameter is not given at all; e.g. PipeOpPCA `center` defaults to `TRUE`, since the underlying function (`prcomp`)'s does centering when the `center` argument is not given at all. In contrast, a parameter is "initialized" to some value if it is set to some value upon construction of a PipeOp. In rare cases, this can differ from default, e.g. if the underlying default behaviour is suboptimal for the use for preprocessing (e.g. it stores training data unnecessarily by default).
78+
- a parameter can be marked as "required" by having the tag `"required"`. It is a special tag that causes an error if the value is not set. A "required" parameter *can not* have a "default", since semantically this is a contradiction: "default" would describe what happens when the param is not set, but param-not-set is an error.
79+
- When we write preprocessing method ourselves we usually don't do "default" behaviour and instead mark most things as "required". "default" is mostly if we wrap some other library's function which itself has a function argument default value.
80+
- We initialize a parameter by giving the `p_xxx(init = )` argument. Some old code does `param_set$values = list(...)` or `param_set$values$param = ...` in the constructor. This is deprecated; we do not unnecessarily change it in old code, but new code should have `init = `. A parameter should be documented as "initialized to" something if and only if the value is set through one of these methods in the constructor.
81+
- Inside the train / predict functions of PipeOps, hyperparameter values should be obtained through `pv = self$param_set$get_values(tags = )`, where `tags` is often `"train"`, `"predict"`, or some custom tag that groups hyperparameters by meaning somehow (e.g. everything that should be passed to a specific function). A nice pattern is to call a function `fname` with many options configured through `pv` while also explicitly passing some arguments as `invoke(fname, arg1 = val1, arg2 = val2, .args = pv)`, using `invoke` from `mlr3misc`.
82+
- paradox does type-checking and range-checking automatically; `get_values()` automatically checks that `"required"` params are present and not `NULL`. Therefore, we only do additional parameter feasibility checks in the rarest of cases.
83+
- Minor things to be aware of:
84+
- Errors that are thrown in PipeOps are automatically wrapped by Graph to also mention the PipeOp ID, so it is not necessary to include that in error messages.
85+
86+
</project_info>
87+
<agent_notes>
88+
89+
# Notes by Agents to other Agents
90+
91+
- R unit tests in this repo assume helper `expect_man_exists()` is available. If you need to call it in a new test and you are working without mlr3pipelines installed, define a local fallback at the top of that test file before `expect_learner()` is used.
92+
- Revdep helper scripts live in `attic/revdeps/`. `download_revdeps.R` downloads reverse dependency source tarballs; `install_revdep_suggests.R` installs Suggests for those revdeps without pulling the revdeps themselves.
93+
94+
</agent_notes>
95+
<your_task>
96+
Again, when implementing something, focus on:
97+
98+
1. Think things through and plan ahead.
99+
2. Tests before implementation, if possible. In any case, write high quality tests, try to be better than the tests you find in this project.
100+
3. Once you started, work independently; we can always undo things if necessary.
101+
4. Create sensible intermediate commits.
102+
5. Check your work, make sure tests pass. But do not run *all* tests, they take a long time.
103+
6. Write a report to the user at the end, informing about decisoins that were made autonomously, unexpected issues etc.
104+
</your_task>

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
AGENTS.md

DESCRIPTION

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: mlr3pipelines
22
Title: Preprocessing Operators and Pipelines for 'mlr3'
3-
Version: 0.7.0-9000
3+
Version: 0.9.0-9000
44
Authors@R:
55
c(person(given = "Martin",
66
family = "Binder",
@@ -58,7 +58,7 @@ URL: https://mlr3pipelines.mlr-org.com,
5858
https://github.com/mlr-org/mlr3pipelines
5959
BugReports: https://github.com/mlr-org/mlr3pipelines/issues
6060
Depends:
61-
R (>= 3.1.0)
61+
R (>= 3.3.0)
6262
Imports:
6363
backports,
6464
checkmate,
@@ -67,10 +67,9 @@ Imports:
6767
digest,
6868
lgr,
6969
mlr3 (>= 0.20.0),
70-
mlr3misc (>= 0.9.0),
71-
paradox,
72-
R6,
73-
withr
70+
mlr3misc (>= 0.17.0),
71+
paradox (>= 1.0.0),
72+
R6
7473
Suggests:
7574
ggplot2,
7675
glmnet,
@@ -79,7 +78,7 @@ Suggests:
7978
lme4,
8079
mlbench,
8180
bbotk (>= 0.3.0),
82-
mlr3filters (>= 0.1.1),
81+
mlr3filters (>= 0.8.1),
8382
mlr3learners,
8483
mlr3measures,
8584
nloptr,
@@ -96,7 +95,6 @@ Suggests:
9695
evaluate,
9796
NMF,
9897
MASS,
99-
kknn,
10098
GenSA,
10199
methods,
102100
vtreat,
@@ -111,8 +109,8 @@ Config/testthat/edition: 3
111109
Config/testthat/parallel: true
112110
NeedsCompilation: no
113111
Roxygen: list(markdown = TRUE, r6 = FALSE)
114-
RoxygenNote: 7.3.2
115-
VignetteBuilder: knitr
112+
RoxygenNote: 7.3.3
113+
VignetteBuilder: knitr, rmarkdown
116114
Collate:
117115
'CnfAtom.R'
118116
'CnfClause.R'
@@ -143,9 +141,11 @@ Collate:
143141
'PipeOpCollapseFactors.R'
144142
'PipeOpCopy.R'
145143
'PipeOpDateFeatures.R'
144+
'PipeOpDecode.R'
146145
'PipeOpEncode.R'
147146
'PipeOpEncodeImpact.R'
148147
'PipeOpEncodeLmer.R'
148+
'PipeOpEncodePL.R'
149149
'PipeOpFeatureUnion.R'
150150
'PipeOpFilter.R'
151151
'PipeOpFixFactors.R'
@@ -199,6 +199,7 @@ Collate:
199199
'PipeOpVtreat.R'
200200
'PipeOpYeoJohnson.R'
201201
'Selector.R'
202+
'TaskRegr_boston_housing.R'
202203
'assert_graph.R'
203204
'bibentries.R'
204205
'greplicate.R'
@@ -215,6 +216,7 @@ Collate:
215216
'pipeline_targettrafo.R'
216217
'po.R'
217218
'ppl.R'
219+
'preproc.R'
218220
'reexports.R'
219221
'typecheck.R'
220222
'zzz.R'

NAMESPACE

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,12 +65,15 @@ S3method(pos,"NULL")
6565
S3method(pos,character)
6666
S3method(pos,list)
6767
S3method(predict,Graph)
68+
S3method(preproc,Graph)
69+
S3method(preproc,PipeOp)
6870
S3method(print,CnfAtom)
6971
S3method(print,CnfClause)
7072
S3method(print,CnfFormula)
7173
S3method(print,CnfSymbol)
7274
S3method(print,CnfUniverse)
7375
S3method(print,Multiplicity)
76+
S3method(print,PipeOpNMFstate)
7477
S3method(print,Selector)
7578
S3method(set_validate,GraphLearner)
7679
S3method(set_validate,PipeOpLearner)
@@ -108,9 +111,13 @@ export(PipeOpColRoles)
108111
export(PipeOpCollapseFactors)
109112
export(PipeOpCopy)
110113
export(PipeOpDateFeatures)
114+
export(PipeOpDecode)
111115
export(PipeOpEncode)
112116
export(PipeOpEncodeImpact)
113117
export(PipeOpEncodeLmer)
118+
export(PipeOpEncodePL)
119+
export(PipeOpEncodePLQuantiles)
120+
export(PipeOpEncodePLTree)
114121
export(PipeOpEnsemble)
115122
export(PipeOpFeatureUnion)
116123
export(PipeOpFilter)
@@ -203,6 +210,7 @@ export(po)
203210
export(pos)
204211
export(ppl)
205212
export(ppls)
213+
export(preproc)
206214
export(register_autoconvert_function)
207215
export(reset_autoconvert_register)
208216
export(reset_class_hierarchy_cache)
@@ -233,4 +241,3 @@ importFrom(stats,setNames)
233241
importFrom(utils,bibentry)
234242
importFrom(utils,head)
235243
importFrom(utils,tail)
236-
importFrom(withr,with_options)

0 commit comments

Comments
 (0)