Skip to content

Commit 15e3b7b

Browse files
committed
emigrate reformat doc-string detail into MLJ/docs
1 parent 917ec66 commit 15e3b7b

File tree

1 file changed

+14
-85
lines changed

1 file changed

+14
-85
lines changed

src/model_api.jl

Lines changed: 14 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -34,91 +34,20 @@ function update_data end
3434
3535
Models optionally overload `reformat` to define transformations of
3636
user-supplied data into some model-specific representation (e.g., from
37-
a table to a matrix). Computational overheads associated with multiple
38-
`fit!`/`predict`/`transform` calls are then avoided, when memory
39-
resources allow. The fallback returns `args` (no transformation).
40-
41-
Here "user-supplied data" is what the MLJ user supplies when
42-
constructing a machine, as in `machine(models, args...)`, which
43-
coincides with the arguments expected by `fit(model, verbosity,
44-
args...)` when `reformat` is not overloaded.
45-
46-
Implementing a `reformat` data front-end is permitted for any `Model`
47-
subtype, except for subtypes of `Static`. Here is a complete list of
48-
responsibilities for such an implementation, for some
49-
`model::SomeModelType`:
50-
51-
- A `reformat(model::SomeModelType, args...) -> data` method must be
52-
implemented for each form of `args...` appearing in a valid machine
53-
construction `machine(model, args...)` (there will be one for each
54-
possible signature of `fit(::SomeModelType, ...)`).
55-
56-
- Additionally, if not included above, there must be a single argument
57-
form of reformat, `reformat(model::SommeModelType, arg) -> (data,)`,
58-
serving as a data front-end for operations like `predict`. It must
59-
always hold that `reformat(model, args...)[1] = reformat(model,
60-
args[1])`.
61-
62-
**Warning.** `reformat(model::SomeModelType, args...)` must always
63-
return a tuple of the same length as `args`, even if this is one.
64-
65-
- `fit(model::SomeModelType, verbosity, data...)` should be
66-
implemented as if `data` is the output of `reformat(model,
67-
args...)`, where `args` is the data an MLJ user has bound to `model`
68-
in some machine. The same applies to any overloading of `update`.
69-
70-
- Each implemented operation, such as `predict` and `transform` - but
71-
excluding `inverse_transform` - must be defined as if its data
72-
arguments are `reformat`ed versions of user-supplied data. For
73-
example, in the supervised case, `data_new` in
74-
`predict(model::SomeModelType, fitresult, data_new)` is
75-
`reformat(model, Xnew)`, where `Xnew is the data provided by the MLJ
76-
user in a call `predict(mach, Xnew)` (`mach.model == model`).
77-
78-
- To specify how the model-specific representation of data is to be
79-
resampled, implement `selectrows(model::SomeModelType, I, data...)
80-
-> resampled_data` for each overloading of `reformat(model::SomeModel,
81-
args...) -> data` above. Here `I` is an arbitrary abstract integer
82-
vector or `:` (type `Colon`).
83-
84-
**Warning.** `selectrows(model::SomeModelType, I, args...)` must always
85-
return a tuple of the same length as `args`, even if this is one.
86-
87-
The fallback for `selectrows` is described at [`selectrows`](@ref).
88-
89-
90-
### Example
91-
92-
Suppose a supervised model type `SomeSupervised` supports sample
93-
weights, leading to two different `fit` signatures:
94-
95-
fit(model::SomeSupervised, verbosity, X, y)
96-
fit(model::SomeSupervised, verbosity, X, y, w)
97-
98-
predict(model::SomeSupervised, fitresult, Xnew)
99-
100-
Without a data front-end implemented, suppose `X` is expected to be a
101-
table and `y` a vector, but suppose the core algorithm always converts
102-
`X` to a matrix with features as rows (features corresponding to
103-
columns in the table). Then a new data-front end might look like
104-
this:
105-
106-
constant MMI = MLJModelInterface
107-
108-
# for fit:
109-
MMI.reformat(::SomeSupervised, X, y) = (MMI.matrix(X, transpose=true), y)
110-
MMI.reformat(::SomeSupervised, X, y, w) = (MMI.matrix(X, transpose=true), y, w)
111-
MMI.selectrows(::SomeSupervised, I, Xmatrix, y) =
112-
(view(Xmatrix, :, I), view(y, I))
113-
MMI.selectrows(::SomeSupervised, I, Xmatrix, y, w) =
114-
(view(Xmatrix, :, I), view(y, I), view(w, I))
115-
116-
# for predict:
117-
MMI.reformat(::SomeSupervised, X) = (MMI.matrix(X, transpose=true),)
118-
MMI.selectrows(::SomeSupervised, I, Xmatrix) = view(Xmatrix, I)
119-
120-
With these additions, `fit` and `predict` are refactored, so that `X`
121-
and `Xnew` represent matrices with features as rows.
37+
a table to a matrix). When implemented, the MLJ user can avoid
38+
repeating such transformations unnecessarily, and can additionally
39+
make use of more efficient row subsampling, which is then based on the
40+
model-specific representation of data, rather than the
41+
user-representation. When `reformat` is overloaded,
42+
`selectrows(::Model, ...)` must be as well (see
43+
[`selectrows](@ref)). Furthermore, the model `fit` method(s), and
44+
operations such as `predict` and `transform`, must be refactored to
45+
act on the model-specific representions of the data.
46+
47+
To implement the `reformat` data front-end for a model, refer to
48+
"Implementing a data front-end" in the [MLJ
49+
manual](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/).
50+
12251
12352
"""
12453
reformat(model::Model, args...) = args

0 commit comments

Comments
 (0)