@@ -34,91 +34,20 @@ function update_data end
34
34
35
35
Models optionally overload `reformat` to define transformations of
36
36
user-supplied data into some model-specific representation (e.g., from
37
- a table to a matrix). Computational overheads associated with multiple
38
- `fit!`/`predict`/`transform` calls are then avoided, when memory
39
- resources allow. The fallback returns `args` (no transformation).
40
-
41
- Here "user-supplied data" is what the MLJ user supplies when
42
- constructing a machine, as in `machine(models, args...)`, which
43
- coincides with the arguments expected by `fit(model, verbosity,
44
- args...)` when `reformat` is not overloaded.
45
-
46
- Implementing a `reformat` data front-end is permitted for any `Model`
47
- subtype, except for subtypes of `Static`. Here is a complete list of
48
- responsibilities for such an implementation, for some
49
- `model::SomeModelType`:
50
-
51
- - A `reformat(model::SomeModelType, args...) -> data` method must be
52
- implemented for each form of `args...` appearing in a valid machine
53
- construction `machine(model, args...)` (there will be one for each
54
- possible signature of `fit(::SomeModelType, ...)`).
55
-
56
- - Additionally, if not included above, there must be a single argument
57
- form of reformat, `reformat(model::SommeModelType, arg) -> (data,)`,
58
- serving as a data front-end for operations like `predict`. It must
59
- always hold that `reformat(model, args...)[1] = reformat(model,
60
- args[1])`.
61
-
62
- **Warning.** `reformat(model::SomeModelType, args...)` must always
63
- return a tuple of the same length as `args`, even if this is one.
64
-
65
- - `fit(model::SomeModelType, verbosity, data...)` should be
66
- implemented as if `data` is the output of `reformat(model,
67
- args...)`, where `args` is the data an MLJ user has bound to `model`
68
- in some machine. The same applies to any overloading of `update`.
69
-
70
- - Each implemented operation, such as `predict` and `transform` - but
71
- excluding `inverse_transform` - must be defined as if its data
72
- arguments are `reformat`ed versions of user-supplied data. For
73
- example, in the supervised case, `data_new` in
74
- `predict(model::SomeModelType, fitresult, data_new)` is
75
- `reformat(model, Xnew)`, where `Xnew is the data provided by the MLJ
76
- user in a call `predict(mach, Xnew)` (`mach.model == model`).
77
-
78
- - To specify how the model-specific representation of data is to be
79
- resampled, implement `selectrows(model::SomeModelType, I, data...)
80
- -> resampled_data` for each overloading of `reformat(model::SomeModel,
81
- args...) -> data` above. Here `I` is an arbitrary abstract integer
82
- vector or `:` (type `Colon`).
83
-
84
- **Warning.** `selectrows(model::SomeModelType, I, args...)` must always
85
- return a tuple of the same length as `args`, even if this is one.
86
-
87
- The fallback for `selectrows` is described at [`selectrows`](@ref).
88
-
89
-
90
- ### Example
91
-
92
- Suppose a supervised model type `SomeSupervised` supports sample
93
- weights, leading to two different `fit` signatures:
94
-
95
- fit(model::SomeSupervised, verbosity, X, y)
96
- fit(model::SomeSupervised, verbosity, X, y, w)
97
-
98
- predict(model::SomeSupervised, fitresult, Xnew)
99
-
100
- Without a data front-end implemented, suppose `X` is expected to be a
101
- table and `y` a vector, but suppose the core algorithm always converts
102
- `X` to a matrix with features as rows (features corresponding to
103
- columns in the table). Then a new data-front end might look like
104
- this:
105
-
106
- constant MMI = MLJModelInterface
107
-
108
- # for fit:
109
- MMI.reformat(::SomeSupervised, X, y) = (MMI.matrix(X, transpose=true), y)
110
- MMI.reformat(::SomeSupervised, X, y, w) = (MMI.matrix(X, transpose=true), y, w)
111
- MMI.selectrows(::SomeSupervised, I, Xmatrix, y) =
112
- (view(Xmatrix, :, I), view(y, I))
113
- MMI.selectrows(::SomeSupervised, I, Xmatrix, y, w) =
114
- (view(Xmatrix, :, I), view(y, I), view(w, I))
115
-
116
- # for predict:
117
- MMI.reformat(::SomeSupervised, X) = (MMI.matrix(X, transpose=true),)
118
- MMI.selectrows(::SomeSupervised, I, Xmatrix) = view(Xmatrix, I)
119
-
120
- With these additions, `fit` and `predict` are refactored, so that `X`
121
- and `Xnew` represent matrices with features as rows.
37
+ a table to a matrix). When implemented, the MLJ user can avoid
38
+ repeating such transformations unnecessarily, and can additionally
39
+ make use of more efficient row subsampling, which is then based on the
40
+ model-specific representation of data, rather than the
41
+ user-representation. When `reformat` is overloaded,
42
+ `selectrows(::Model, ...)` must be as well (see
43
+ [`selectrows](@ref)). Furthermore, the model `fit` method(s), and
44
+ operations such as `predict` and `transform`, must be refactored to
45
+ act on the model-specific representions of the data.
46
+
47
+ To implement the `reformat` data front-end for a model, refer to
48
+ "Implementing a data front-end" in the [MLJ
49
+ manual](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/).
50
+
122
51
123
52
"""
124
53
reformat (model:: Model , args... ) = args
0 commit comments