File tree Expand file tree Collapse file tree 2 files changed +18
-18
lines changed Expand file tree Collapse file tree 2 files changed +18
-18
lines changed Original file line number Diff line number Diff line change @@ -137,21 +137,21 @@ In MLJ or MLJBase, bind an instance `model` to data with
137
137
138
138
mach = machine(model, X)
139
139
140
- $DOC_IDF
140
+ $DOC_TRANSFORMER_INPUTS
141
141
142
142
Train the machine using `fit!(mach, rows=...)`.
143
143
144
144
# Hyper-parameters
145
145
146
- - `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider.
147
- Terms that occur in `> max_doc_freq` documents will not be considered by the
148
- transformer. For example, if `max_doc_freq` is set to 0.9, terms that are in more than
149
- 90% of the documents will be removed.
146
+ - `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider. Terms
147
+ that occur in `> max_doc_freq` documents will not be considered by the transformer. For
148
+ example, if `max_doc_freq` is set to 0.9, terms that are in more than 90% of the
149
+ documents will be removed.
150
150
151
- - `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider.
152
- Terms that occur in `< max_doc_freq` documents will not be considered by the
153
- transformer. A value of 0.01 means that only terms that are at least in 1% of the
154
- documents will be included.
151
+ - `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider. Terms
152
+ that occur in `< max_doc_freq` documents will not be considered by the transformer. A
153
+ value of 0.01 means that only terms that are at least in 1% of the documents will be
154
+ included.
155
155
156
156
- `κ=2`: The term frequency saturation characteristic. Higher values represent slower
157
157
saturation. What we mean by saturation is the degree to which a term occurring extra
Original file line number Diff line number Diff line change @@ -94,15 +94,15 @@ Train the machine using `fit!(mach, rows=...)`.
94
94
95
95
# Hyper-parameters
96
96
97
- - `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider.
98
- Terms that occur in `> max_doc_freq` documents will not be considered by the
99
- transformer. For example, if `max_doc_freq` is set to 0.9, terms that are in more than
100
- 90% of the documents will be removed.
101
-
102
- - `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider.
103
- Terms that occur in `< max_doc_freq` documents will not be considered by the
104
- transformer. A value of 0.01 means that only terms that are at least in 1% of the
105
- documents will be included.
97
+ - `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider. Terms
98
+ that occur in `> max_doc_freq` documents will not be considered by the transformer. For
99
+ example, if `max_doc_freq` is set to 0.9, terms that are in more than 90% of the
100
+ documents will be removed.
101
+
102
+ - `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider. Terms
103
+ that occur in `< max_doc_freq` documents will not be considered by the transformer. A
104
+ value of 0.01 means that only terms that are at least in 1% of the documents will be
105
+ included.
106
106
107
107
# Operations
108
108
You can’t perform that action at this time.
0 commit comments